Bellabeat Case Study

How Can a Wellness Technology Company Play It Smart?

Introduction

Bellabeat is a manufacturer of high-tech products focused on women’s health. Its founders developed beautifully designed technology to inform and inspire women around the world.Collecting data on activity, sleep, stress and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Bellabeat is a successful small company, but it has the potential to become a larger player in the global smart device market.

As a junior data analyst working in the marketing analyst team, I have been asked to focus on one of Bellabeat products and analyze smart device data to gain insight into how consumers use their non-Bellabeat smart devices. The insights I uncover will help guide marketing strategy and could help unlock new growth opportunities for the company.

In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Steps data analysis process

‣ 1. ASK

Executive Team (stakeholders)

Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.
Sando Mur: Mathematician and Bellabeat’s cofounder
Bellabeat marketing analytics team.

Statement business task

Analyze data to get some trends on how consumers use non-Bellabeat smart devices.
Identify which trends could apply to Bellabeat customers and how they could be implemented to one of their products.
Present some proposals for Bellabeat’s marketing strategies based on our analyses.

‣ 2. PREPARE

Data Exploration

This data set was obtained from the MÖBIUS Kaggle account (FitBit Fitness Tracker Data) from Kaggle servers CC0: Public Domain. The data is not original and is not current due that this was collected by third parties through a survey distributed through Amazon Mechanical Turk, between December 3, 2016 and December 5, 2016.

The data provided is a secondary external data that contains information from thirty (33) eligible Fitbit users who have consented to submit personal tracking data, including minute-level output for physical activity, heart rate, and sleep monitoring. The information is organized into sixteen (16) .csv (comma separated values) files. Each file contains structured, quantitative and nominal data organized in tables that have numbers, strings and boolean values. Tables are presented in wide data format, where each column contains a single data variable, with a specific data type and associated constraints, and other in long data format where each subject will have data in multiple rows.

Data Credibility

I consider the data to be unreliable as it was not possible to identify and extract the data from the original source. It was also not possible to find other databases of similar studies that could support my analysis. Furthermore, the sample presents bias since it is not representative of the population as a whole, it does not even indicate demographic data such as age, gender or place of data collection, and although they say that the sample corresponds to data from 2 months, there is only data of a single month.

I was not provided with data about Bellabeat products to analyze smart device data to gain insight into how consumers use their smart devices.

Due to the lack of other more recent and reliable sources of information, it is a challenge to extract the necessary information to perform a useful analysis for decision making, even so, I do what was requested and show the data analysis process step by step to finally propose my high-level recommendations for Bellabeat’s marketing strategy.

Analysis to determine the Bellabeat product that most resembles FitBit data

We found that the Bellabeat Leaf product is the one we can compare the most with the data provided by FitBit. Therefore, we select it as a product to apply our insights.

Comparison tracking features FitBit vs. Bellabeat products

Brand	Device	Tracking Features
		Similar		Different
FitBit	watch	heart rate	sleeping monitoring	minute-level output
Bellabeat app	app	stress	sleep	menstrual cycle
Bellabeat Leaf	bracelet, necklace or clip	heart rate	sleep	respiratory rate, cardiac coherence, menstrual cycle
Bellabeat Time *	watch	stress	sleep	-
Bellabeat Spring	water bottle	-	-	hydration levels

*Bellabeat Time (watch) is not a product currently for sale according to its website.

‣ 3. PROCESS

Installing and loading Packages (R)

# Installation of all the necessary packages for cleaning and transformation and data

install.packages("tidyverse")
install.packages("tidyverse")
install.packages("lubridate")
install.packages("tidyr")
install.packages("readr")
install.packages("readxl")
install.packages("dplyr")
install.packages("Tmisc")
install.packages("janitor")
install.packages("writexl")
install.packages("gridExtra")

# Loading all the necessary packages for cleaning and transformation and data

library(tidyverse)
library(lubridate)
library(tidyr)
library(readr)
library(readxl)
library(dplyr)
library(Tmisc)
library(knitr)
library(yaml)
library(janitor)
library(writexl)
library(gridExtra)

Load and explore data

# Loading of the 6 useful files resulting from the cleaning, filtering and previous analysis carried out in Google Sheets to the 16 original files

ActivityDays_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/ActivityDays_v3.xlsx")
CaloriesHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/CaloriesHours_v3.xlsx")
IntensitiesDays_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/IntensitiesDays_v3.xlsx")
IntensitiesHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/IntensitiesHours_v3.xlsx")
SleepHours_v3 <- read_excel("/Bellabeat Case Study/FitBit Study Data 01.23.2024/SleepHours_v3.xlsx")
StepsHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/StepsHours_v3.xlsx")

# Files preview

View(StepsHours_v3)
View(SleepHours_v3)
View(IntensitiesHours_v3)
View(IntensitiesDays_v3)
View(CaloriesHours_v3)
View(ActivityDays_v3)

Cleaning data

# Check the numbers of users in each table. Tables with fewer than 30 users were not used because they do not meet the minimum required for statistical significance. However, I kept the SleepHours table for reference analysis.

n_distinct(StepsHours$Id)
[1] 33
n_distinct(SleepHours$Id)
[1] 24
n_distinct(IntensitiesHours$Id)
[1] 33
n_distinct(IntensitiesDays_v3$Id)
[1] 33
n_distinct(CaloriesHours$Id)
[1] 33
n_distinct(ActivityDays_v3$Id)
[1] 33

# Check syntax of each column in the original data frame

str(StepsHours_v3)
str(SleepHours_v3)
str(IntensitiesHours_v3)
str(IntensitiesDays_v3)
str(CaloriesHours_v3)
str(ActivityDays_v3)

# Convert Date-Time columns to POSIXct object if needed in each file

StepsHours_v3$ActivityHour <- as.POSIXct(StepsHours_v3$ActivityHour)
SleepHours_v3$SleepDay <- as.POSIXct(SleepHours_v3$SleepDay)
IntensitiesHours_v3$ActivityHour <- as.POSIXct(IntensitiesHours_v3$ActivityHour)
CaloriesHours_v3$ActivityHour <- as.POSIXct(CaloriesHours_v3$ActivityHour)

# Separate Date-Time columns into "Date" and "Time" columns in each file

StepsHours <- StepsHours_v3 %>%
  mutate(Date = as.Date(ActivityHour),              # Extract date component
         Time = format(ActivityHour, "%H:%M:%S"))   # Extract time component

SleepHours <- SleepHours_v3 %>%
  mutate(Date = as.Date(SleepDay),              # Extract date component
         Time = format(SleepDay, "%H:%M:%S"))   # Extract time component

IntensitiesHours <- IntensitiesHours_v3 %>%
  mutate(Date = as.Date(ActivityHour),              # Extract date component
         Time = format(ActivityHour, "%H:%M:%S"))   # Extract time component

CaloriesHours <- CaloriesHours_v3 %>%
  mutate(Date = as.Date(ActivityHour),              # Extract date component
         Time = format(ActivityHour, "%H:%M:%S"))   # Extract time component

# Convert data types. Remove rows with missing values

StepsHours$StepTotal <- as.numeric(StepsHours$StepTotal)    # Convert columns to numeric 
StepsHours <- na.omit(StepsHours)                           # Remove rows with missing values

SleepHours <- SleepHours %>%                                # Convert columns to numeric 
  mutate(across(c(TotalMinutesAsleep, TotalTimeInBed, TotalHoursAsleep, TotalHoursInBed), as.numeric))
StepsHours <- na.omit(StepsHours)                           # Remove rows with missing values

IntensitiesHours <- IntensitiesHours %>%                    # Convert columns to numeric
  mutate(across(c(TotalIntensity, AverageIntensity), as.numeric))
IntensitiesHours <- na.omit(IntensitiesHours)               # Remove rows with missing values

IntensitiesDays <- IntensitiesDays_v3 %>%                   # Convert columns to numeric
  mutate(across(c(SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance), as.numeric))                                                
IntensitiesDays <- na.omit(IntensitiesDays)                 # Remove rows with missing values

CaloriesHours$Calories <- as.numeric(CaloriesHours$Calories)    # Convert to numeric 
CaloriesHours <- na.omit(CaloriesHours)                         # Remove rows with missing values

ActivityDays <- ActivityDays_v3 %>%
  mutate(across(c(TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories), as.numeric))   # Convert to numeric 
ActivityDays <- na.omit(ActivityDays)                       # Remove rows with missing values

# Identify and remove duplicates if necessary

sum(duplicated(StepsHours))                         # Identify duplicates
[1] 0                                               # No duplicates
sum(duplicated(SleepHours))
[1] 3
distinct(SleepHours)                                # Eliminate duplicates                                                             
# A tibble: 410 × 9
           Id SleepDay            TotalSleepRecords TotalMinutesAsleep TotalTimeInBed TotalHoursAsleep TotalHoursInBed
        <dbl> <dttm>                          <dbl>              <dbl>          <dbl>            <dbl>           <dbl>
 1 1503960366 2016-04-12 00:00:01                 1                327            346             5.45            5.77
 2 1503960366 2016-04-13 00:00:00                 2                384            407             6.4             6.78
 3 1503960366 2016-04-15 00:00:00                 1                412            442             6.87            7.37
 4 1503960366 2016-04-16 00:00:00                 2                340            367             5.67            6.12
 5 1503960366 2016-04-17 00:00:00                 1                700            712            11.7            11.9 
 6 1503960366 2016-04-19 00:00:00                 1                304            320             5.07            5.33
 7 1503960366 2016-04-20 00:00:00                 1                360            377             6               6.28
 8 1503960366 2016-04-21 00:00:00                 1                325            364             5.42            6.07
 9 1503960366 2016-04-23 00:00:00                 1                361            384             6.02            6.4 
10 1503960366 2016-04-24 00:00:00                 1                430            449             7.17            7.48
# ℹ 400 more rows
# ℹ 2 more variables: Date <date>, Time <chr>
# ℹ Use `print(n = ...)` to see more rows

sum(duplicated(IntensitiesHours))
[1] 0
sum(duplicated(IntensitiesHours))
[1] 0
sum(duplicated(IntensitiesDays_v3))
[1] 0
sum(duplicated(CaloriesHours))
[1] 0
sum(duplicated(ActivityDays_v3))
[1] 0

# Number of observations in each dataframe

nrow(StepsHours)
[1] 21165
nrow(SleepHours)
[1] 413
nrow(IntensitiesHours)
[1] 22099
nrow(IntensitiesDays)
[1] 940
nrow(CaloriesHours)
[1] 22099
nrow(ActivityDays)
[1] 940

‣ 4. ANALIZE

Organizing data

“StepsHours”

# GRAPHIC 1

# Add the a column to indicate the day of the week

StepsHours$Day_of_Week <- weekdays(as.Date(StepsHours$Date))
head(StepsHours)                                          # View the updated data frame

# Create a pivot table for StepTotal by Day_of_Week and Time

pivot_table <- aggregate(StepTotal ~ Day_of_Week + Time, data = StepsHours, FUN = sum)

# Create the visualization (heatmap)

ggplot(data = pivot_table, aes(x = Time, y = Day_of_Week, fill = StepTotal)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Steps Taken by Hour of the Day and Day of the Week",
       x = "Hour of the Day", y = "Day of the Week") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Export the new data frame to Excel.

write_xlsx(StepsHours, "StepsHours_df.xlsx")

The day of greatest physical activity according to the number of steps taken is Wednesday between 5:00 p.m. and 7:00 p.m., followed by Saturday between 12:00 a.m. and 2:00 p.m., and finally Tuesday from 11:00 a.m. to 12:00 p.m. and from 5:00 p.m. to 7:00 p.m.

# GRAPHIC 2

# Convert Id column to factor with levels in the original order

StepsHours_df$Id <- factor(StepsHours_df$Id, levels = unique(StepsHours_df$Id))

# Create a summary data frame with the total steps per user

summary_steps <- StepsHours_df %>%
  group_by(Id) %>%
  summarise(total_steps = sum(StepTotal / 33)) %>%
  mutate(activity_type = factor(case_when(
    total_steps < 5000 ~   "Sedentary Active",
    total_steps >= 5000 &  total_steps < 7500  ~  "Lightly Active",
    total_steps >= 7500 &  total_steps < 10000 ~  "Moderately Active",
    total_steps >= 10000 ~ "Very Active"), levels = c("Sedentary Active", "Lightly Active","Moderately Active","Very Active"))) %>% 
  arrange(-total_steps)

# Convert Id to a factor to ensure it's treated as categorical

summary_steps$Id <- factor(summary_steps$Id)

# Create the bar plot

steps_barplot <- ggplot(data = summary_steps, aes(x = Id, y = total_steps, fill = activity_type)) +
  geom_bar(stat = "identity", color = "blue") +
  geom_hline(yintercept = 5000, linetype = "dashed", color = "red", size = 0.5) +
  labs(title = "Total Steps per User",
       x = "User ID",
       y = "Total Steps") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
  theme(axis.text.y = element_text(size = 6)) +
  scale_y_continuous(labels = scales::comma) +
  annotate("text", x=11,y=13000,label="-Minimun daily steps: 5000", color = "red")

# Display the plot

print(steps_barplot)

Recent studies have revealed that a minimum of 5,000 steps per day and even less may be enough to see a health benefit. Taking these analyzes into account, only 33% of users have a sedentary activity, 30% show Lightly activity, that is, slightly more than 5000 steps per day, and 37% present Moderately Active and Very Active activity.
With this we can conclude that the majority of users present a healthy physical activity, although we cannot deduce with certainty that this is a reflection of the general population, since we do not know FitBit’s market niche, possibly the majority of FitBit users are physically active people.

“SleepsHours”

# Add the a column to indicate the day of the week

SleepHours$Day_of_Week <- weekdays(as.Date(SleepHours$Date))
head(StepsHours)                     # View the updated data frame

# Create a vector of custom breaks for the Y-axis and define the levels for the "Day_of_Week" factor

custom_breaks <- c(5, 6, 7, 8, 9, 10) 
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

# Convert "Day_of_Week" to a factor with the specified levels

SleepHours$Day_of_Week <- factor(SleepHours$Day_of_Week, levels = weekday_levels)

# Create the plot with custom Y-axis breaks, multiple fill colors, and black lines/points

ggplot(SleepHours, aes(x = Day_of_Week, y = TotalHoursAsleep, fill = factor(Day_of_Week))) +
  geom_boxplot(color = "black") +  # Specify the color of the boxplot outlines
  geom_point(color = "black") +     # Specify the color of the points
  geom_line(color = "black", size = 0.1) +  # Specify the color and size of the lines
  labs(title = "Total Hours Asleep by Day of the Week",
       x = "Day of the Week", y = "Total Hours Asleep") +
  scale_y_continuous(breaks = custom_breaks) +
  scale_fill_manual(values = c("brown2", "lightsalmon", "yellow1", "yellowgreen", "skyblue", "royalblue", "mediumpurple1")) +
  theme_bw()

# Export the new data frame to Excel.

write_xlsx(SleepHours, "SleepHours_df.xlsx")

There is no defined linear correlation between hours of sleep and days of greatest physical activity. The days where users sleep the most hours are Saturdays and Sundays, but it may only be because it is the weekend.
Most users meet the minimum hours recommended by the Centers for Disease Control and Prevention (CDC), 7 hours a day per night for adults between 18 and 60 years old.

“IntensitiesHours”

# Add a column to indicate the day of the week

IntensitiesHours$Day_of_Week <- weekdays(as.Date(IntensitiesHours$Date))

# Extract hour from the Time column

IntensitiesHours$Hour <- as.integer(substr(IntensitiesHours$Time, 1, 2))

# Summarize TotalIntensity per hour

IntensitiesHours <- IntensitiesHours %>%
  group_by(Hour) %>%
  mutate(TotalIntensityPerHour = sum(TotalIntensity)) %>%
  mutate(AverageIntensityPerHour = mean(TotalIntensity))

# Create a heatmap of intensity by hour and day of the week

ggplot(IntensitiesHours, aes(x = factor(Hour), y = factor(Day_of_Week), fill = TotalIntensityPerHour)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Intensity Patterns by Hour and Day of the Week",
       x = "Hour of the Day", y = "Day of the Week",
       fill = "Total Intensity Per Hour") +
  theme(axis.text.x = element_text(hjust = 1, size = 6)) + 
  theme(axis.text.y = element_text(size = 8))

# Export the new data frame to Excel.

write_xlsx(IntensitiesHours, "IntensitiesHours_df.xlsx")

There is a constant pattern that is repeated every day of the week, and the hours of greatest intensity of physical work are divided into two blocks: a range that goes from 10:00 a.m. to 2:00 p.m., and another from 5:00 p.m. to 7:00 p.m. Showing a more marked intensity at 12:00 p.m. and in the afternoon range.

“IntensitiesDays”

# GRAPHIC 1

# Add a column to indicate the day of the week

IntensitiesDays$Day_of_Week <- weekdays(as.Date(IntensitiesDays$ActivityDay))

# Converting columns to numeric

IntensitiesDays <- IntensitiesDays %>%
  mutate(across(c(SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, LightActiveDistance, VeryActiveDistance), as.numeric))

# Create scatter plots with regression lines for each activity level

sedentary_plot <- ggplot(IntensitiesDays, aes(x = SedentaryMinutes, y = SedentaryActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "maroon") +
  labs(title = "Sedentary Activity",
       x = "Sedentary Minutes", y = "Sedentary Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

lightly_active_plot <- ggplot(IntensitiesDays, aes(x = LightlyActiveMinutes, y = LightActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "yellow1") +
  labs(title = "Lightly Active Activity",
       x = "Lightly Active Minutes", y = "Light Active Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

fairly_active_plot <- ggplot(IntensitiesDays, aes(x = FairlyActiveMinutes, y = ModeratelyActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "green") +
  labs(title = "Fairly Active Activity",
       x = "Fairly Active Minutes", y = "Moderately Active Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

very_active_plot <- ggplot(IntensitiesDays, aes(x = VeryActiveMinutes, y = VeryActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "darkmagenta") +
  labs(title = "Very Active Activity",
       x = "Very Active Minutes", y = "Very Active Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

# Arrange plots in a grid

grid.arrange(sedentary_plot, lightly_active_plot, fairly_active_plot, very_active_plot, nrow = 2)

# Export the new data frame to Excel.

write_xlsx(IntensitiesDays, "IntensitiesDays_df.xlsx")

There is a positive correlation between walking speed and distance traveled. When the activity is sedentary there is no linear correlation since time increases but distance is zero. When the activity is Lightly, as the walking time increases, the distance also increases. When the activity is Fairly and Very Active, the activity time is shorter and the distance traveled is also shorter compared to the Lightly intensity, but the positive correlation is still maintained.
The day which presented more intensity activity are: Lightly Active: Thursday, Saturday and Sunday. Fairly Active: Tuesday and Sunday; Very Active: Saturday and Sunday.
The majority of users have healthy physical activity, complying with the minimum daily activity recommended by The World Health Organization (WHO): adults get at least 21 minutes of moderate-intensity aerobic physical activity per day, or 2.5 hours per week. This can be broken down into 150–300 minutes of moderate-intensity activity, or 75–150 minutes of vigorous-intensity activity. The WHO also recommends an equivalent combination of moderate- and vigorous-intensity activity.

# GRAPHIC 2

# Define the order of levels for the "Activity" variable

activity_order <- c("SedentaryActiveDistance", "LightActiveDistance", 
                    "ModeratelyActiveDistance", "VeryActiveDistance")

# Define the levels for the "Day_of_Week" factor in the desired order

weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

# Convert "Day_of_Week" to a factor with the specified levels

IntensitiesDays$Day_of_Week <- factor(IntensitiesDays$Day_of_Week, levels = weekday_levels)

# Calculate mean values for each day of the week across all users

IntensitiesDays_mean <- IntensitiesDays %>%
  group_by(Day_of_Week) %>%
  summarise_at(vars(SedentaryActiveDistance, LightActiveDistance, 
                    ModeratelyActiveDistance, VeryActiveDistance), mean, na.rm = TRUE)

# Reshape the data into long format for distance variables

IntensitiesDays_long_distance <- gather(IntensitiesDays_mean, key = "Activity", value =     
                                 "Distance", SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance)

# Convert "Activity" to a factor with specified order of levels

IntensitiesDays_long_distance$Activity <- factor(IntensitiesDays_long_distance$Activity, 
                                                 levels = activity_order)

# Create the bar plot for distance

distance_plot <- ggplot(IntensitiesDays_long_distance, aes(x = Day_of_Week, y = Distance, fill = Activity)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Distance Levels by Day of Week",
       x = "Activity Day", y = "Distance") +
  scale_fill_manual(values = c("SedentaryActiveDistance" = "cyan3", 
                               "LightActiveDistance" = "olivedrab3",
                               "ModeratelyActiveDistance" = "mediumorchid1",
                               "VeryActiveDistance" = "coral")) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
  theme(axis.text.y = element_text(size = 6))

# Combine both plots side by side

grid.arrange(activity_plot, distance_plot, ncol = 1)

Minutes Activity Levels: Lightly activity remains constant throughout the week, Fairly or Very Active activity is much lower and only shows a very slight variation in intensity on Mondays and Tuesdays. Distance Activity Levels: Light activity predominates most days of the week, with higher values observed on Saturdays and lower values on Mondays. Moderate activity is constant and Very Active activity shows a small difference in greater distance traveled on Tuesdays, Wednesdays and Saturdays.

“CaloriesHours”

# Add a column to indicate the day of the week

CaloriesHours$Day_of_Week <- weekdays(as.Date(CaloriesHours$Date))

# Define the levels for the "Day_of_Week" factor in the desired order

weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

# Convert "Day_of_Week" to a factor with the specified levels

CaloriesHours$Day_of_Week <- factor(CaloriesHours$Day_of_Week, levels = weekday_levels)

# Create a heatmap

heatmap_plot <- ggplot(CaloriesHours, aes(x = Time, y = Day_of_Week, fill = Calories)) +
  geom_tile() +
  scale_fill_gradient(low = "blue", high = "red", name = "Calories") +
  labs(title = "Calories Consumed per Hour and Day of Week",
       x = "Hour of the Day", y = "Day of the Week") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
        legend.position = "bottom") +
  theme(axis.text.y = element_text(size = 7))

# Display the plot

print(heatmap_plot)

# Export the new data frame to Excel.

write_xlsx(CaloriesHours, "CaloriesHours_df.xlsx")

The days in which the consumption of calories per hour (above 500 calories) is most notable are Saturdays between 12:00 p.m. and 2:00 p.m., Fridays at 2:00 p.m., Wednesdays at 5:00 p.m. and on Sundays at 4:00 p.m.

“ActivityDays”

# Add a column to indicate the day of the week

ActivityDays$Day_of_Week <- weekdays(as.Date(ActivityDays$ActivityDate))

# Create a new column adding LightlyActiveMinutes, FairlyActiveMinutes, and VeryActiveMinutes

ActivityDays <- ActivityDays %>%
  group_by(Id) %>%
  mutate(TotalMinutes = LightlyActiveMinutes + FairlyActiveMinutes + VeryActiveMinutes,
         AvgTotalMinutes = mean(TotalMinutes, na.rm = TRUE))

# Convert Id column to factor with levels in the original order

ActivityDays$Id <- factor(ActivityDays$Id, levels = unique(ActivityDays$Id))

# Create a bubble chart

ggplot(ActivityDays, aes(x = Day_of_Week, y = Calories, size = TotalMinutes, color = Id)) +
  geom_point(alpha = 0.6) +  # Adjust transparency for better visibility
  geom_hline(yintercept = 1800, linetype = "dashed", color = "magenta3", size = 0.5) +
  geom_hline(yintercept = 2800, linetype = "dashed", color = "blue3", size = 0.5) +
  scale_size_continuous(range = c(0.3, 5)) +  # Adjust the range of bubble sizes
  labs(title = "Bubble Chart of Activity Data",
       x = "Day of Week",
       y = "Total Calories",
       size = "Total Minutes",
       color = "Users") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
  theme(axis.text.y = element_text(size = 6)) +
  scale_y_continuous(labels = scales::comma) +
  annotate("text", x=3,y=600,label="-Minimun daily calories for women: 1,800", color = "magenta3") +
  annotate("text", x=2.88,y=400,label="-Minimun daily calories for men: 2,000", color = "blue3")

# Export the new data frame to Excel.

write_xlsx(ActivityDays, "ActivityDays_df.xlsx")

A healthy calorie burning amount for adult women ranges from 1,800 to 2,400 calories per day and for men is 2,000 to 3,200 calories per day. The graph shows that users consume between 1,300 and 3,800 calories on average, some exceptions of users consume less than 1,000 per day sometime in the week, but the trend points to a larger group with calorie consumption greater than 4,000 daily. Users consume a good average of daily calories.

‣ 5. SHARE

Findings

1. Limitation of the data:

The data used has inaccurate or missing data, we do not know the demographic data of the sample; the existence of null or non-existent values could be due to the lack of use of the devices by users; and some discrepancies between the tables that could be due to statistical sampling error.

2. Trends from other smart devices:

Steps-hour-day:

The day of greatest physical activity according to the number of steps taken is Wednesday between 5:00 p.m. and 7:00 p.m., followed by Saturday between 12:00 a.m. and 2:00 p.m., and finally Tuesday from 11:00 a.m. to 12:00 p.m. and from 5:00 p.m. to 7:00 p.m.
Only 33% of users have a sedentary activity (less than 5,000 steps by day). 30% show light activity (slightly more than 5,000 steps per day). And, 37% present Moderately Active and Very Active activity (more than 5,000 steps by day).

Sleeps-hour-day:

There is no defined linear correlation between hours of sleep and days of greatest physical activity. The days where users sleep the most hours are Saturdays and Sundays, but it may only be because it is the weekend.
Most users meet the minimum hours recommended by the Centers for Disease Control and Prevention (CDC), 7 hours a day per night for adults between 18 and 60 years old.

Intensity-hour-day

The hours of greatest intensity of physical work are divided into two blocks: a range that goes from 10:00 a.m. to 2:00 p.m., and another from 5:00 p.m. to 7:00 p.m. Showing a more marked intensity at 12:00 p.m. and in the afternoon range.
The majority of users have healthy physical activity, complying with the minimum daily activity recommended by The World Health Organization (WHO): adults get at least 21 minutes of moderate-intensity aerobic physical activity per day, or 2.5 hours per week. This can be broken down into 150–300 minutes of moderate-intensity activity, or 75–150 minutes of vigorous-intensity activity. The WHO also recommends an equivalent combination of moderate- and vigorous-intensity activity.

Calories-hour-day

The days in which the consumption of calories per hour (above 500 calories) is most notable are Saturdays between 12:00 p.m. and 2:00 p.m., Fridays at 2:00 p.m., Wednesdays at 5:00 p.m. and on Sundays at 4:00 p.m.
The graph shows that users consume between 1,300 and 3,800 calories on average, some exceptions of users consume less than 1,000 per day sometime in the week, but the trend points to a larger group with calorie consumption greater than 4,000 daily. Users consume a good average of daily calories.

Conclusions

The majority of users maintain healthy physical activity, complying with the calorie burning and the minimum daily activity recommended by the World Health Organization (WHO) and with the minimum hours of sleep recommended by the Centers for Disease Control and Prevention (CDC).

This positive result raises three questions for me: 1. Could it be because a high percentage of FitBit users are physically active people?, 2. Are people motivated to be more physically active just by using the FitBit device?, or, 3. Is FitBit using some type of strategy to motivate its users to be more physically active? To answer these questions, more information is necessary to perform a deeper analysis of the data and draw useful conclusions.

Trends Bellabeat could apply to its customers and products

Bellabeat Leaf is the product currently maintained by the company that has the same characteristics as the FitBit watch and some others, such as: respiratory rate and menstrual cycle.

Taking this into account, the insights obtained from FitBit could be applied to our product.

Improve data quality:
- Improve data collection methods to ensure accurate and complete data, emphasizing demographic data for appropriate sample selection that generates useful analyses.
Feature Enhancement:
- Incorporate other functions into our health tracking device, such as tracking steps, physical activity intensity and calorie consumption. The device should provide information and recommendations tailored to individual users based on their activity levels and sleep patterns.
User Engagement:
- Educate users about the importance of maintaining awareness of their body’s health and personal care, using the measurements provided by our product.
- Provide personalized recommendations and notifications to encourage users to achieve their health and fitness goals.
Data visualization:
- Develop user-friendly, interactive data visualizations (charts, graphs, and summary statistics) within the health tracking device to help users understand their activity trends, sleep patterns, and calorie consumption.
Continuous improvement:
- Collect user feedback about their experience with the health tracking device and use it to make continuous improvements to the device’s features and functionality.
- Monitor user engagement and device adoption to identify areas for improvement and optimization.
Partnerships and collaboration:
- Explore strategic alliances with health and wellness organizations to leverage their expertise and resources in promoting healthy lifestyles and behavior changes.
- Collaborate with healthcare professionals to ensure device functionalities align with evidence-based recommendations for physical activity, sleep and nutrition, hormonal cycles, and important health information for women.

Proposals for Bellabeat’s marketing strategies

1. Identification of the target audience:

Redefine your target audience based on demographics, lifestyle and health behavior. This could include women of any age who are fitness enthusiasts, health conscious, athletes, or looking to improve their overall well-being.

2. Brand positioning:

Clearly define the unique value proposition of your health tracking device and how it addresses the needs and pain points of your female audience.
Highlight the features and benefits that differentiate your device from the competition and position it as a must-have tool for achieving health and fitness goals.

3. Multi-channel marketing approach:

Develop a multi-channel marketing strategy to reach your target audience across multiple platforms and channels, including digital, social media, email marketing and traditional channels.
Leverage online advertising, influencer partnerships, content marketing, and search engine optimization (SEO) to increase brand visibility and reach.

4. Content Marketing and Thought Leadership:

Create valuable and informative content related to women’s health, fitness, nutrition and wellness to establish your brand as a thought leader in the industry.
Share insightful articles, blog posts, videos, and infographics that provide tips, advice, and resources to help users achieve their health and fitness goals.

5. User Engagement and Community Building:

Foster a sense of community among users of your health tracking device by creating online forums, women’s social media groups, and community events where users can connect, share their experiences, and support each other.
Encourage user-generated content and testimonials to showcase real-life success stories and build trust in your brand.

6. Partnerships and collaborations:

Explore strategic partnerships with influential women in fitness, health professionals, gyms, wellness centers and other relevant organizations to expand your reach and access new audiences.
Collaborate with healthcare providers and insurance companies to promote the use of your health tracking device as part of wellness and preventive care programs.

7. Data privacy and security guarantee:

Emphasize the importance of data privacy and security to assure users that their personal health information will be protected and handled with the utmost care.
Implement strong security measures and comply with industry standards and regulations to maintain the confidentiality and integrity of user data.

8. Feedback and continuous improvements:

Solicit user feedback through surveys, reviews, and customer support channels to identify areas for improvement and innovation.
Actively listen to user feedback and incorporate their suggestions and preferences into product development and marketing initiatives.

‣ 6. ACT

Possible next steps for stakeholders to take based on my findings

Firstly, evaluate the most suitable process for collecting complete, accurate and demographically defined data, with the aim of being able to obtain useful conclusions for decision making.

Secondly, once the data has been analyzed and the findings and conclusions have been shared, a meeting must be held with the design and programming team to determine the feasibility and costs of implementing improvements in the functionalities and design of the Bellabeat product.

Thirdly, a meeting must be held with the marketing department to study the advertising and marketing strategy to be carried out so that the product is positioned as unique and the benefits of its use among women are made known.

                                                   Thank you!