How Can a Wellness Technology Company Play It Smart?

Introduction

Bellabeat is a manufacturer of high-tech products focused on women’s health. Its founders developed beautifully designed technology to inform and inspire women around the world.Collecting data on activity, sleep, stress and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Bellabeat is a successful small company, but it has the potential to become a larger player in the global smart device market.

As a junior data analyst working in the marketing analyst team, I have been asked to focus on one of Bellabeat products and analyze smart device data to gain insight into how consumers use their non-Bellabeat smart devices. The insights I uncover will help guide marketing strategy and could help unlock new growth opportunities for the company.

In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

 

Steps data analysis process

‣ 1. ASK

Executive Team (stakeholders)

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.
  • Sando Mur: Mathematician and Bellabeat’s cofounder
  • Bellabeat marketing analytics team.

 

Statement business task

  • Analyze data to get some trends on how consumers use non-Bellabeat smart devices.
  • Identify which trends could apply to Bellabeat customers and how they could be implemented to one of their products.
  • Present some proposals for Bellabeat’s marketing strategies based on our analyses.

 

‣ 2. PREPARE

Data Exploration

 

This data set was obtained from the MÖBIUS Kaggle account (FitBit Fitness Tracker Data) from Kaggle servers CC0: Public Domain. The data is not original and is not current due that this was collected by third parties through a survey distributed through Amazon Mechanical Turk, between December 3, 2016 and December 5, 2016.

The data provided is a secondary external data that contains information from thirty (33) eligible Fitbit users who have consented to submit personal tracking data, including minute-level output for physical activity, heart rate, and sleep monitoring. The information is organized into sixteen (16) .csv (comma separated values) files. Each file contains structured, quantitative and nominal data organized in tables that have numbers, strings and boolean values. Tables are presented in wide data format, where each column contains a single data variable, with a specific data type and associated constraints, and other in long data format where each subject will have data in multiple rows.

 

Data Credibility

 

I consider the data to be unreliable as it was not possible to identify and extract the data from the original source. It was also not possible to find other databases of similar studies that could support my analysis. Furthermore, the sample presents bias since it is not representative of the population as a whole, it does not even indicate demographic data such as age, gender or place of data collection, and although they say that the sample corresponds to data from 2 months, there is only data of a single month.

I was not provided with data about Bellabeat products to analyze smart device data to gain insight into how consumers use their smart devices.

Due to the lack of other more recent and reliable sources of information, it is a challenge to extract the necessary information to perform a useful analysis for decision making, even so, I do what was requested and show the data analysis process step by step to finally propose my high-level recommendations for Bellabeat’s marketing strategy.

 

Analysis to determine the Bellabeat product that most resembles FitBit data

 

We found that the Bellabeat Leaf product is the one we can compare the most with the data provided by FitBit. Therefore, we select it as a product to apply our insights.

 

Comparison tracking features FitBit vs. Bellabeat products

Brand Device Tracking Features
Similar Different
FitBit watch heart rate sleeping monitoring minute-level output
Bellabeat app app stress sleep menstrual cycle
Bellabeat Leaf bracelet, necklace or clip heart rate sleep respiratory rate, cardiac coherence, menstrual cycle
Bellabeat Time * watch stress sleep -
Bellabeat Spring water bottle - - hydration levels
*Bellabeat Time (watch) is not a product currently for sale according to its website.

 

‣ 3. PROCESS

Installing and loading Packages (R)

# Installation of all the necessary packages for cleaning and transformation and data
install.packages("tidyverse")
install.packages("tidyverse")
install.packages("lubridate")
install.packages("tidyr")
install.packages("readr")
install.packages("readxl")
install.packages("dplyr")
install.packages("Tmisc")
install.packages("janitor")
install.packages("writexl")
install.packages("gridExtra")
# Loading all the necessary packages for cleaning and transformation and data
library(tidyverse)
library(lubridate)
library(tidyr)
library(readr)
library(readxl)
library(dplyr)
library(Tmisc)
library(knitr)
library(yaml)
library(janitor)
library(writexl)
library(gridExtra)

Load and explore data

# Loading of the 6 useful files resulting from the cleaning, filtering and previous analysis carried out in Google Sheets to the 16 original files
ActivityDays_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/ActivityDays_v3.xlsx")
CaloriesHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/CaloriesHours_v3.xlsx")
IntensitiesDays_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/IntensitiesDays_v3.xlsx")
IntensitiesHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/IntensitiesHours_v3.xlsx")
SleepHours_v3 <- read_excel("/Bellabeat Case Study/FitBit Study Data 01.23.2024/SleepHours_v3.xlsx")
StepsHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/StepsHours_v3.xlsx")
# Files preview 
View(StepsHours_v3)
View(SleepHours_v3)
View(IntensitiesHours_v3)
View(IntensitiesDays_v3)
View(CaloriesHours_v3)
View(ActivityDays_v3)

Cleaning data

# Check the numbers of users in each table. Tables with fewer than 30 users were not used because they do not meet the minimum required for statistical significance. However, I kept the SleepHours table for reference analysis.
n_distinct(StepsHours$Id)
[1] 33
n_distinct(SleepHours$Id)
[1] 24
n_distinct(IntensitiesHours$Id)
[1] 33
n_distinct(IntensitiesDays_v3$Id)
[1] 33
n_distinct(CaloriesHours$Id)
[1] 33
n_distinct(ActivityDays_v3$Id)
[1] 33
# Check syntax of each column in the original data frame
str(StepsHours_v3)
str(SleepHours_v3)
str(IntensitiesHours_v3)
str(IntensitiesDays_v3)
str(CaloriesHours_v3)
str(ActivityDays_v3)
# Convert Date-Time columns to POSIXct object if needed in each file
StepsHours_v3$ActivityHour <- as.POSIXct(StepsHours_v3$ActivityHour)
SleepHours_v3$SleepDay <- as.POSIXct(SleepHours_v3$SleepDay)
IntensitiesHours_v3$ActivityHour <- as.POSIXct(IntensitiesHours_v3$ActivityHour)
CaloriesHours_v3$ActivityHour <- as.POSIXct(CaloriesHours_v3$ActivityHour)
# Separate Date-Time columns into "Date" and "Time" columns in each file 
StepsHours <- StepsHours_v3 %>%
  mutate(Date = as.Date(ActivityHour),              # Extract date component
         Time = format(ActivityHour, "%H:%M:%S"))   # Extract time component

SleepHours <- SleepHours_v3 %>%
  mutate(Date = as.Date(SleepDay),              # Extract date component
         Time = format(SleepDay, "%H:%M:%S"))   # Extract time component

IntensitiesHours <- IntensitiesHours_v3 %>%
  mutate(Date = as.Date(ActivityHour),              # Extract date component
         Time = format(ActivityHour, "%H:%M:%S"))   # Extract time component

CaloriesHours <- CaloriesHours_v3 %>%
  mutate(Date = as.Date(ActivityHour),              # Extract date component
         Time = format(ActivityHour, "%H:%M:%S"))   # Extract time component        
# Convert data types. Remove rows with missing values 
StepsHours$StepTotal <- as.numeric(StepsHours$StepTotal)    # Convert columns to numeric 
StepsHours <- na.omit(StepsHours)                           # Remove rows with missing values

SleepHours <- SleepHours %>%                                # Convert columns to numeric 
  mutate(across(c(TotalMinutesAsleep, TotalTimeInBed, TotalHoursAsleep, TotalHoursInBed), as.numeric))
StepsHours <- na.omit(StepsHours)                           # Remove rows with missing values

IntensitiesHours <- IntensitiesHours %>%                    # Convert columns to numeric
  mutate(across(c(TotalIntensity, AverageIntensity), as.numeric))
IntensitiesHours <- na.omit(IntensitiesHours)               # Remove rows with missing values

IntensitiesDays <- IntensitiesDays_v3 %>%                   # Convert columns to numeric
  mutate(across(c(SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance), as.numeric))                                                
IntensitiesDays <- na.omit(IntensitiesDays)                 # Remove rows with missing values

CaloriesHours$Calories <- as.numeric(CaloriesHours$Calories)    # Convert to numeric 
CaloriesHours <- na.omit(CaloriesHours)                         # Remove rows with missing values

ActivityDays <- ActivityDays_v3 %>%
  mutate(across(c(TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories), as.numeric))   # Convert to numeric 
ActivityDays <- na.omit(ActivityDays)                       # Remove rows with missing values
# Identify and remove duplicates if necessary 
sum(duplicated(StepsHours))                         # Identify duplicates
[1] 0                                               # No duplicates
sum(duplicated(SleepHours))
[1] 3
distinct(SleepHours)                                # Eliminate duplicates                                                             
# A tibble: 410 × 9
           Id SleepDay            TotalSleepRecords TotalMinutesAsleep TotalTimeInBed TotalHoursAsleep TotalHoursInBed
        <dbl> <dttm>                          <dbl>              <dbl>          <dbl>            <dbl>           <dbl>
 1 1503960366 2016-04-12 00:00:01                 1                327            346             5.45            5.77
 2 1503960366 2016-04-13 00:00:00                 2                384            407             6.4             6.78
 3 1503960366 2016-04-15 00:00:00                 1                412            442             6.87            7.37
 4 1503960366 2016-04-16 00:00:00                 2                340            367             5.67            6.12
 5 1503960366 2016-04-17 00:00:00                 1                700            712            11.7            11.9 
 6 1503960366 2016-04-19 00:00:00                 1                304            320             5.07            5.33
 7 1503960366 2016-04-20 00:00:00                 1                360            377             6               6.28
 8 1503960366 2016-04-21 00:00:00                 1                325            364             5.42            6.07
 9 1503960366 2016-04-23 00:00:00                 1                361            384             6.02            6.4 
10 1503960366 2016-04-24 00:00:00                 1                430            449             7.17            7.48
# ℹ 400 more rows
# ℹ 2 more variables: Date <date>, Time <chr>
# ℹ Use `print(n = ...)` to see more rows

sum(duplicated(IntensitiesHours))
[1] 0
sum(duplicated(IntensitiesHours))
[1] 0
sum(duplicated(IntensitiesDays_v3))
[1] 0
sum(duplicated(CaloriesHours))
[1] 0
sum(duplicated(ActivityDays_v3))
[1] 0
# Number of observations in each dataframe
nrow(StepsHours)
[1] 21165
nrow(SleepHours)
[1] 413
nrow(IntensitiesHours)
[1] 22099
nrow(IntensitiesDays)
[1] 940
nrow(CaloriesHours)
[1] 22099
nrow(ActivityDays)
[1] 940

 

‣ 4. ANALIZE

Organizing data

“StepsHours”
# GRAPHIC 1
# Add the a column to indicate the day of the week
StepsHours$Day_of_Week <- weekdays(as.Date(StepsHours$Date))
head(StepsHours)                                          # View the updated data frame
# Create a pivot table for StepTotal by Day_of_Week and Time
pivot_table <- aggregate(StepTotal ~ Day_of_Week + Time, data = StepsHours, FUN = sum)
# Create the visualization (heatmap)
ggplot(data = pivot_table, aes(x = Time, y = Day_of_Week, fill = StepTotal)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Steps Taken by Hour of the Day and Day of the Week",
       x = "Hour of the Day", y = "Day of the Week") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Export the new data frame to Excel.
write_xlsx(StepsHours, "StepsHours_df.xlsx")

  • The day of greatest physical activity according to the number of steps taken is Wednesday between 5:00 p.m. and 7:00 p.m., followed by Saturday between 12:00 a.m. and 2:00 p.m., and finally Tuesday from 11:00 a.m. to 12:00 p.m. and from 5:00 p.m. to 7:00 p.m.

 

# GRAPHIC 2
# Convert Id column to factor with levels in the original order
StepsHours_df$Id <- factor(StepsHours_df$Id, levels = unique(StepsHours_df$Id))
# Create a summary data frame with the total steps per user
summary_steps <- StepsHours_df %>%
  group_by(Id) %>%
  summarise(total_steps = sum(StepTotal / 33)) %>%
  mutate(activity_type = factor(case_when(
    total_steps < 5000 ~   "Sedentary Active",
    total_steps >= 5000 &  total_steps < 7500  ~  "Lightly Active",
    total_steps >= 7500 &  total_steps < 10000 ~  "Moderately Active",
    total_steps >= 10000 ~ "Very Active"), levels = c("Sedentary Active", "Lightly Active","Moderately Active","Very Active"))) %>% 
  arrange(-total_steps)
# Convert Id to a factor to ensure it's treated as categorical
summary_steps$Id <- factor(summary_steps$Id)
# Create the bar plot
steps_barplot <- ggplot(data = summary_steps, aes(x = Id, y = total_steps, fill = activity_type)) +
  geom_bar(stat = "identity", color = "blue") +
  geom_hline(yintercept = 5000, linetype = "dashed", color = "red", size = 0.5) +
  labs(title = "Total Steps per User",
       x = "User ID",
       y = "Total Steps") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
  theme(axis.text.y = element_text(size = 6)) +
  scale_y_continuous(labels = scales::comma) +
  annotate("text", x=11,y=13000,label="-Minimun daily steps: 5000", color = "red")
# Display the plot
print(steps_barplot)

  • Recent studies have revealed that a minimum of 5,000 steps per day and even less may be enough to see a health benefit. Taking these analyzes into account, only 33% of users have a sedentary activity, 30% show Lightly activity, that is, slightly more than 5000 steps per day, and 37% present Moderately Active and Very Active activity.

  • With this we can conclude that the majority of users present a healthy physical activity, although we cannot deduce with certainty that this is a reflection of the general population, since we do not know FitBit’s market niche, possibly the majority of FitBit users are physically active people.

 

“SleepsHours”
# Add the a column to indicate the day of the week
SleepHours$Day_of_Week <- weekdays(as.Date(SleepHours$Date))
head(StepsHours)                     # View the updated data frame
# Create a vector of custom breaks for the Y-axis and define the levels for the "Day_of_Week" factor 
custom_breaks <- c(5, 6, 7, 8, 9, 10) 
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
# Convert "Day_of_Week" to a factor with the specified levels
SleepHours$Day_of_Week <- factor(SleepHours$Day_of_Week, levels = weekday_levels)
# Create the plot with custom Y-axis breaks, multiple fill colors, and black lines/points
ggplot(SleepHours, aes(x = Day_of_Week, y = TotalHoursAsleep, fill = factor(Day_of_Week))) +
  geom_boxplot(color = "black") +  # Specify the color of the boxplot outlines
  geom_point(color = "black") +     # Specify the color of the points
  geom_line(color = "black", size = 0.1) +  # Specify the color and size of the lines
  labs(title = "Total Hours Asleep by Day of the Week",
       x = "Day of the Week", y = "Total Hours Asleep") +
  scale_y_continuous(breaks = custom_breaks) +
  scale_fill_manual(values = c("brown2", "lightsalmon", "yellow1", "yellowgreen", "skyblue", "royalblue", "mediumpurple1")) +
  theme_bw()
# Export the new data frame to Excel.
write_xlsx(SleepHours, "SleepHours_df.xlsx")

  • There is no defined linear correlation between hours of sleep and days of greatest physical activity. The days where users sleep the most hours are Saturdays and Sundays, but it may only be because it is the weekend.

  • Most users meet the minimum hours recommended by the Centers for Disease Control and Prevention (CDC), 7 hours a day per night for adults between 18 and 60 years old.

 

“IntensitiesHours”
# Add a column to indicate the day of the week
IntensitiesHours$Day_of_Week <- weekdays(as.Date(IntensitiesHours$Date))
# Extract hour from the Time column
IntensitiesHours$Hour <- as.integer(substr(IntensitiesHours$Time, 1, 2))
# Summarize TotalIntensity per hour
IntensitiesHours <- IntensitiesHours %>%
  group_by(Hour) %>%
  mutate(TotalIntensityPerHour = sum(TotalIntensity)) %>%
  mutate(AverageIntensityPerHour = mean(TotalIntensity))
# Create a heatmap of intensity by hour and day of the week
ggplot(IntensitiesHours, aes(x = factor(Hour), y = factor(Day_of_Week), fill = TotalIntensityPerHour)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Intensity Patterns by Hour and Day of the Week",
       x = "Hour of the Day", y = "Day of the Week",
       fill = "Total Intensity Per Hour") +
  theme(axis.text.x = element_text(hjust = 1, size = 6)) + 
  theme(axis.text.y = element_text(size = 8))  
# Export the new data frame to Excel.
write_xlsx(IntensitiesHours, "IntensitiesHours_df.xlsx")

  • There is a constant pattern that is repeated every day of the week, and the hours of greatest intensity of physical work are divided into two blocks: a range that goes from 10:00 a.m. to 2:00 p.m., and another from 5:00 p.m. to 7:00 p.m. Showing a more marked intensity at 12:00 p.m. and in the afternoon range.

 

“IntensitiesDays”
# GRAPHIC 1
# Add a column to indicate the day of the week
IntensitiesDays$Day_of_Week <- weekdays(as.Date(IntensitiesDays$ActivityDay))
# Converting columns to numeric
IntensitiesDays <- IntensitiesDays %>%
  mutate(across(c(SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, LightActiveDistance, VeryActiveDistance), as.numeric))
# Create scatter plots with regression lines for each activity level
sedentary_plot <- ggplot(IntensitiesDays, aes(x = SedentaryMinutes, y = SedentaryActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "maroon") +
  labs(title = "Sedentary Activity",
       x = "Sedentary Minutes", y = "Sedentary Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

lightly_active_plot <- ggplot(IntensitiesDays, aes(x = LightlyActiveMinutes, y = LightActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "yellow1") +
  labs(title = "Lightly Active Activity",
       x = "Lightly Active Minutes", y = "Light Active Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

fairly_active_plot <- ggplot(IntensitiesDays, aes(x = FairlyActiveMinutes, y = ModeratelyActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "green") +
  labs(title = "Fairly Active Activity",
       x = "Fairly Active Minutes", y = "Moderately Active Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here

very_active_plot <- ggplot(IntensitiesDays, aes(x = VeryActiveMinutes, y = VeryActiveDistance)) +
  geom_point(color = "cornflowerblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "darkmagenta") +
  labs(title = "Very Active Activity",
       x = "Very Active Minutes", y = "Very Active Distance") +
  facet_wrap(~Day_of_Week) +
  theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
  theme(axis.text.y = element_text(size = 5))  # Adjust y-axis text size here
# Arrange plots in a grid
grid.arrange(sedentary_plot, lightly_active_plot, fairly_active_plot, very_active_plot, nrow = 2)
# Export the new data frame to Excel.
write_xlsx(IntensitiesDays, "IntensitiesDays_df.xlsx")

  • There is a positive correlation between walking speed and distance traveled. When the activity is sedentary there is no linear correlation since time increases but distance is zero. When the activity is Lightly, as the walking time increases, the distance also increases. When the activity is Fairly and Very Active, the activity time is shorter and the distance traveled is also shorter compared to the Lightly intensity, but the positive correlation is still maintained.

  • The day which presented more intensity activity are: Lightly Active: Thursday, Saturday and Sunday. Fairly Active: Tuesday and Sunday; Very Active: Saturday and Sunday.

  • The majority of users have healthy physical activity, complying with the minimum daily activity recommended by The World Health Organization (WHO): adults get at least 21 minutes of moderate-intensity aerobic physical activity per day, or 2.5 hours per week. This can be broken down into 150–300 minutes of moderate-intensity activity, or 75–150 minutes of vigorous-intensity activity. The WHO also recommends an equivalent combination of moderate- and vigorous-intensity activity.

 

# GRAPHIC 2
# Define the order of levels for the "Activity" variable
activity_order <- c("SedentaryActiveDistance", "LightActiveDistance", 
                    "ModeratelyActiveDistance", "VeryActiveDistance")
# Define the levels for the "Day_of_Week" factor in the desired order
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
# Convert "Day_of_Week" to a factor with the specified levels
IntensitiesDays$Day_of_Week <- factor(IntensitiesDays$Day_of_Week, levels = weekday_levels)
# Calculate mean values for each day of the week across all users
IntensitiesDays_mean <- IntensitiesDays %>%
  group_by(Day_of_Week) %>%
  summarise_at(vars(SedentaryActiveDistance, LightActiveDistance, 
                    ModeratelyActiveDistance, VeryActiveDistance), mean, na.rm = TRUE)
# Reshape the data into long format for distance variables
IntensitiesDays_long_distance <- gather(IntensitiesDays_mean, key = "Activity", value =     
                                 "Distance", SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance)
# Convert "Activity" to a factor with specified order of levels
IntensitiesDays_long_distance$Activity <- factor(IntensitiesDays_long_distance$Activity, 
                                                 levels = activity_order)
# Create the bar plot for distance
distance_plot <- ggplot(IntensitiesDays_long_distance, aes(x = Day_of_Week, y = Distance, fill = Activity)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Distance Levels by Day of Week",
       x = "Activity Day", y = "Distance") +
  scale_fill_manual(values = c("SedentaryActiveDistance" = "cyan3", 
                               "LightActiveDistance" = "olivedrab3",
                               "ModeratelyActiveDistance" = "mediumorchid1",
                               "VeryActiveDistance" = "coral")) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
  theme(axis.text.y = element_text(size = 6)) 
# Combine both plots side by side
grid.arrange(activity_plot, distance_plot, ncol = 1)

  • Minutes Activity Levels: Lightly activity remains constant throughout the week, Fairly or Very Active activity is much lower and only shows a very slight variation in intensity on Mondays and Tuesdays. Distance Activity Levels: Light activity predominates most days of the week, with higher values observed on Saturdays and lower values on Mondays. Moderate activity is constant and Very Active activity shows a small difference in greater distance traveled on Tuesdays, Wednesdays and Saturdays.

 

“CaloriesHours”
# Add a column to indicate the day of the week
CaloriesHours$Day_of_Week <- weekdays(as.Date(CaloriesHours$Date))
# Define the levels for the "Day_of_Week" factor in the desired order
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
# Convert "Day_of_Week" to a factor with the specified levels
CaloriesHours$Day_of_Week <- factor(CaloriesHours$Day_of_Week, levels = weekday_levels)
# Create a heatmap
heatmap_plot <- ggplot(CaloriesHours, aes(x = Time, y = Day_of_Week, fill = Calories)) +
  geom_tile() +
  scale_fill_gradient(low = "blue", high = "red", name = "Calories") +
  labs(title = "Calories Consumed per Hour and Day of Week",
       x = "Hour of the Day", y = "Day of the Week") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
        legend.position = "bottom") +
  theme(axis.text.y = element_text(size = 7))
# Display the plot
print(heatmap_plot)
# Export the new data frame to Excel.
write_xlsx(CaloriesHours, "CaloriesHours_df.xlsx")

  • The days in which the consumption of calories per hour (above 500 calories) is most notable are Saturdays between 12:00 p.m. and 2:00 p.m., Fridays at 2:00 p.m., Wednesdays at 5:00 p.m. and on Sundays at 4:00 p.m.

 

“ActivityDays”
# Add a column to indicate the day of the week
ActivityDays$Day_of_Week <- weekdays(as.Date(ActivityDays$ActivityDate))
# Create a new column adding LightlyActiveMinutes, FairlyActiveMinutes, and VeryActiveMinutes
ActivityDays <- ActivityDays %>%
  group_by(Id) %>%
  mutate(TotalMinutes = LightlyActiveMinutes + FairlyActiveMinutes + VeryActiveMinutes,
         AvgTotalMinutes = mean(TotalMinutes, na.rm = TRUE))
# Convert Id column to factor with levels in the original order
ActivityDays$Id <- factor(ActivityDays$Id, levels = unique(ActivityDays$Id))
# Create a bubble chart
ggplot(ActivityDays, aes(x = Day_of_Week, y = Calories, size = TotalMinutes, color = Id)) +
  geom_point(alpha = 0.6) +  # Adjust transparency for better visibility
  geom_hline(yintercept = 1800, linetype = "dashed", color = "magenta3", size = 0.5) +
  geom_hline(yintercept = 2800, linetype = "dashed", color = "blue3", size = 0.5) +
  scale_size_continuous(range = c(0.3, 5)) +  # Adjust the range of bubble sizes
  labs(title = "Bubble Chart of Activity Data",
       x = "Day of Week",
       y = "Total Calories",
       size = "Total Minutes",
       color = "Users") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
  theme(axis.text.y = element_text(size = 6)) +
  scale_y_continuous(labels = scales::comma) +
  annotate("text", x=3,y=600,label="-Minimun daily calories for women: 1,800", color = "magenta3") +
  annotate("text", x=2.88,y=400,label="-Minimun daily calories for men: 2,000", color = "blue3")
# Export the new data frame to Excel.
write_xlsx(ActivityDays, "ActivityDays_df.xlsx")

  • A healthy calorie burning amount for adult women ranges from 1,800 to 2,400 calories per day and for men is 2,000 to 3,200 calories per day. The graph shows that users consume between 1,300 and 3,800 calories on average, some exceptions of users consume less than 1,000 per day sometime in the week, but the trend points to a larger group with calorie consumption greater than 4,000 daily. Users consume a good average of daily calories.

 

‣ 5. SHARE

Findings

1. Limitation of the data:

The data used has inaccurate or missing data, we do not know the demographic data of the sample; the existence of null or non-existent values could be due to the lack of use of the devices by users; and some discrepancies between the tables that could be due to statistical sampling error.

 

2. Trends from other smart devices:

Steps-hour-day:

  • The day of greatest physical activity according to the number of steps taken is Wednesday between 5:00 p.m. and 7:00 p.m., followed by Saturday between 12:00 a.m. and 2:00 p.m., and finally Tuesday from 11:00 a.m. to 12:00 p.m. and from 5:00 p.m. to 7:00 p.m.

  • Only 33% of users have a sedentary activity (less than 5,000 steps by day). 30% show light activity (slightly more than 5,000 steps per day). And, 37% present Moderately Active and Very Active activity (more than 5,000 steps by day).

Sleeps-hour-day:

  • There is no defined linear correlation between hours of sleep and days of greatest physical activity. The days where users sleep the most hours are Saturdays and Sundays, but it may only be because it is the weekend.

  • Most users meet the minimum hours recommended by the Centers for Disease Control and Prevention (CDC), 7 hours a day per night for adults between 18 and 60 years old.

Intensity-hour-day

  • The hours of greatest intensity of physical work are divided into two blocks: a range that goes from 10:00 a.m. to 2:00 p.m., and another from 5:00 p.m. to 7:00 p.m. Showing a more marked intensity at 12:00 p.m. and in the afternoon range.

  • The majority of users have healthy physical activity, complying with the minimum daily activity recommended by The World Health Organization (WHO): adults get at least 21 minutes of moderate-intensity aerobic physical activity per day, or 2.5 hours per week. This can be broken down into 150–300 minutes of moderate-intensity activity, or 75–150 minutes of vigorous-intensity activity. The WHO also recommends an equivalent combination of moderate- and vigorous-intensity activity.

Calories-hour-day

  • The days in which the consumption of calories per hour (above 500 calories) is most notable are Saturdays between 12:00 p.m. and 2:00 p.m., Fridays at 2:00 p.m., Wednesdays at 5:00 p.m. and on Sundays at 4:00 p.m.

  • The graph shows that users consume between 1,300 and 3,800 calories on average, some exceptions of users consume less than 1,000 per day sometime in the week, but the trend points to a larger group with calorie consumption greater than 4,000 daily. Users consume a good average of daily calories.

 

Conclusions

The majority of users maintain healthy physical activity, complying with the calorie burning and the minimum daily activity recommended by the World Health Organization (WHO) and with the minimum hours of sleep recommended by the Centers for Disease Control and Prevention (CDC).

This positive result raises three questions for me: 1. Could it be because a high percentage of FitBit users are physically active people?, 2. Are people motivated to be more physically active just by using the FitBit device?, or, 3. Is FitBit using some type of strategy to motivate its users to be more physically active? To answer these questions, more information is necessary to perform a deeper analysis of the data and draw useful conclusions.

 

Proposals for Bellabeat’s marketing strategies

1. Identification of the target audience:

  • Redefine your target audience based on demographics, lifestyle and health behavior. This could include women of any age who are fitness enthusiasts, health conscious, athletes, or looking to improve their overall well-being.

2. Brand positioning:

  • Clearly define the unique value proposition of your health tracking device and how it addresses the needs and pain points of your female audience.

  • Highlight the features and benefits that differentiate your device from the competition and position it as a must-have tool for achieving health and fitness goals.

3. Multi-channel marketing approach:

  • Develop a multi-channel marketing strategy to reach your target audience across multiple platforms and channels, including digital, social media, email marketing and traditional channels.

  • Leverage online advertising, influencer partnerships, content marketing, and search engine optimization (SEO) to increase brand visibility and reach.

4. Content Marketing and Thought Leadership:

  • Create valuable and informative content related to women’s health, fitness, nutrition and wellness to establish your brand as a thought leader in the industry.

  • Share insightful articles, blog posts, videos, and infographics that provide tips, advice, and resources to help users achieve their health and fitness goals.

5. User Engagement and Community Building:

  • Foster a sense of community among users of your health tracking device by creating online forums, women’s social media groups, and community events where users can connect, share their experiences, and support each other.

  • Encourage user-generated content and testimonials to showcase real-life success stories and build trust in your brand.

6. Partnerships and collaborations:

  • Explore strategic partnerships with influential women in fitness, health professionals, gyms, wellness centers and other relevant organizations to expand your reach and access new audiences.

  • Collaborate with healthcare providers and insurance companies to promote the use of your health tracking device as part of wellness and preventive care programs.

7. Data privacy and security guarantee:

  • Emphasize the importance of data privacy and security to assure users that their personal health information will be protected and handled with the utmost care.

  • Implement strong security measures and comply with industry standards and regulations to maintain the confidentiality and integrity of user data.

8. Feedback and continuous improvements:

  • Solicit user feedback through surveys, reviews, and customer support channels to identify areas for improvement and innovation.

  • Actively listen to user feedback and incorporate their suggestions and preferences into product development and marketing initiatives.

 

‣ 6. ACT

Possible next steps for stakeholders to take based on my findings

Firstly, evaluate the most suitable process for collecting complete, accurate and demographically defined data, with the aim of being able to obtain useful conclusions for decision making.

Secondly, once the data has been analyzed and the findings and conclusions have been shared, a meeting must be held with the design and programming team to determine the feasibility and costs of implementing improvements in the functionalities and design of the Bellabeat product.

Thirdly, a meeting must be held with the marketing department to study the advertising and marketing strategy to be carried out so that the product is positioned as unique and the benefits of its use among women are made known.

 

                                                   Thank you!