Bellabeat is a manufacturer of high-tech products focused on women’s health. Its founders developed beautifully designed technology to inform and inspire women around the world.Collecting data on activity, sleep, stress and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Bellabeat is a successful small company, but it has the potential to become a larger player in the global smart device market.
As a junior data analyst working in the marketing analyst team, I have been asked to focus on one of Bellabeat products and analyze smart device data to gain insight into how consumers use their non-Bellabeat smart devices. The insights I uncover will help guide marketing strategy and could help unlock new growth opportunities for the company.
In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
This data set was obtained from the MÖBIUS Kaggle account (FitBit Fitness Tracker Data) from Kaggle servers CC0: Public Domain. The data is not original and is not current due that this was collected by third parties through a survey distributed through Amazon Mechanical Turk, between December 3, 2016 and December 5, 2016.
The data provided is a secondary external data that contains information from thirty (33) eligible Fitbit users who have consented to submit personal tracking data, including minute-level output for physical activity, heart rate, and sleep monitoring. The information is organized into sixteen (16) .csv (comma separated values) files. Each file contains structured, quantitative and nominal data organized in tables that have numbers, strings and boolean values. Tables are presented in wide data format, where each column contains a single data variable, with a specific data type and associated constraints, and other in long data format where each subject will have data in multiple rows.
I consider the data to be unreliable as it was not possible to identify and extract the data from the original source. It was also not possible to find other databases of similar studies that could support my analysis. Furthermore, the sample presents bias since it is not representative of the population as a whole, it does not even indicate demographic data such as age, gender or place of data collection, and although they say that the sample corresponds to data from 2 months, there is only data of a single month.
I was not provided with data about Bellabeat products to analyze smart device data to gain insight into how consumers use their smart devices.
Due to the lack of other more recent and reliable sources of information, it is a challenge to extract the necessary information to perform a useful analysis for decision making, even so, I do what was requested and show the data analysis process step by step to finally propose my high-level recommendations for Bellabeat’s marketing strategy.
We found that the Bellabeat Leaf product is the one we can compare the most with the data provided by FitBit. Therefore, we select it as a product to apply our insights.
Comparison tracking features FitBit vs. Bellabeat products
| Brand | Device | Tracking Features | ||
|---|---|---|---|---|
| Similar | Different | |||
| FitBit | watch | heart rate | sleeping monitoring | minute-level output |
| Bellabeat app | app | stress | sleep | menstrual cycle |
| Bellabeat Leaf | bracelet, necklace or clip | heart rate | sleep | respiratory rate, cardiac coherence, menstrual cycle |
| Bellabeat Time * | watch | stress | sleep | - |
| Bellabeat Spring | water bottle | - | - | hydration levels |
# Installation of all the necessary packages for cleaning and transformation and data
install.packages("tidyverse")
install.packages("tidyverse")
install.packages("lubridate")
install.packages("tidyr")
install.packages("readr")
install.packages("readxl")
install.packages("dplyr")
install.packages("Tmisc")
install.packages("janitor")
install.packages("writexl")
install.packages("gridExtra")
# Loading all the necessary packages for cleaning and transformation and data
library(tidyverse)
library(lubridate)
library(tidyr)
library(readr)
library(readxl)
library(dplyr)
library(Tmisc)
library(knitr)
library(yaml)
library(janitor)
library(writexl)
library(gridExtra)
# Loading of the 6 useful files resulting from the cleaning, filtering and previous analysis carried out in Google Sheets to the 16 original files
ActivityDays_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/ActivityDays_v3.xlsx")
CaloriesHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/CaloriesHours_v3.xlsx")
IntensitiesDays_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/IntensitiesDays_v3.xlsx")
IntensitiesHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/IntensitiesHours_v3.xlsx")
SleepHours_v3 <- read_excel("/Bellabeat Case Study/FitBit Study Data 01.23.2024/SleepHours_v3.xlsx")
StepsHours_v3 <- read_excel("Bellabeat Case Study/FitBit Study Data 01.23.2024/StepsHours_v3.xlsx")
# Files preview
View(StepsHours_v3)
View(SleepHours_v3)
View(IntensitiesHours_v3)
View(IntensitiesDays_v3)
View(CaloriesHours_v3)
View(ActivityDays_v3)
# Check the numbers of users in each table. Tables with fewer than 30 users were not used because they do not meet the minimum required for statistical significance. However, I kept the SleepHours table for reference analysis.
n_distinct(StepsHours$Id)
[1] 33
n_distinct(SleepHours$Id)
[1] 24
n_distinct(IntensitiesHours$Id)
[1] 33
n_distinct(IntensitiesDays_v3$Id)
[1] 33
n_distinct(CaloriesHours$Id)
[1] 33
n_distinct(ActivityDays_v3$Id)
[1] 33
# Check syntax of each column in the original data frame
str(StepsHours_v3)
str(SleepHours_v3)
str(IntensitiesHours_v3)
str(IntensitiesDays_v3)
str(CaloriesHours_v3)
str(ActivityDays_v3)
# Convert Date-Time columns to POSIXct object if needed in each file
StepsHours_v3$ActivityHour <- as.POSIXct(StepsHours_v3$ActivityHour)
SleepHours_v3$SleepDay <- as.POSIXct(SleepHours_v3$SleepDay)
IntensitiesHours_v3$ActivityHour <- as.POSIXct(IntensitiesHours_v3$ActivityHour)
CaloriesHours_v3$ActivityHour <- as.POSIXct(CaloriesHours_v3$ActivityHour)
# Separate Date-Time columns into "Date" and "Time" columns in each file
StepsHours <- StepsHours_v3 %>%
mutate(Date = as.Date(ActivityHour), # Extract date component
Time = format(ActivityHour, "%H:%M:%S")) # Extract time component
SleepHours <- SleepHours_v3 %>%
mutate(Date = as.Date(SleepDay), # Extract date component
Time = format(SleepDay, "%H:%M:%S")) # Extract time component
IntensitiesHours <- IntensitiesHours_v3 %>%
mutate(Date = as.Date(ActivityHour), # Extract date component
Time = format(ActivityHour, "%H:%M:%S")) # Extract time component
CaloriesHours <- CaloriesHours_v3 %>%
mutate(Date = as.Date(ActivityHour), # Extract date component
Time = format(ActivityHour, "%H:%M:%S")) # Extract time component
# Convert data types. Remove rows with missing values
StepsHours$StepTotal <- as.numeric(StepsHours$StepTotal) # Convert columns to numeric
StepsHours <- na.omit(StepsHours) # Remove rows with missing values
SleepHours <- SleepHours %>% # Convert columns to numeric
mutate(across(c(TotalMinutesAsleep, TotalTimeInBed, TotalHoursAsleep, TotalHoursInBed), as.numeric))
StepsHours <- na.omit(StepsHours) # Remove rows with missing values
IntensitiesHours <- IntensitiesHours %>% # Convert columns to numeric
mutate(across(c(TotalIntensity, AverageIntensity), as.numeric))
IntensitiesHours <- na.omit(IntensitiesHours) # Remove rows with missing values
IntensitiesDays <- IntensitiesDays_v3 %>% # Convert columns to numeric
mutate(across(c(SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance), as.numeric))
IntensitiesDays <- na.omit(IntensitiesDays) # Remove rows with missing values
CaloriesHours$Calories <- as.numeric(CaloriesHours$Calories) # Convert to numeric
CaloriesHours <- na.omit(CaloriesHours) # Remove rows with missing values
ActivityDays <- ActivityDays_v3 %>%
mutate(across(c(TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories), as.numeric)) # Convert to numeric
ActivityDays <- na.omit(ActivityDays) # Remove rows with missing values
# Identify and remove duplicates if necessary
sum(duplicated(StepsHours)) # Identify duplicates
[1] 0 # No duplicates
sum(duplicated(SleepHours))
[1] 3
distinct(SleepHours) # Eliminate duplicates
# A tibble: 410 × 9
Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed TotalHoursAsleep TotalHoursInBed
<dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1503960366 2016-04-12 00:00:01 1 327 346 5.45 5.77
2 1503960366 2016-04-13 00:00:00 2 384 407 6.4 6.78
3 1503960366 2016-04-15 00:00:00 1 412 442 6.87 7.37
4 1503960366 2016-04-16 00:00:00 2 340 367 5.67 6.12
5 1503960366 2016-04-17 00:00:00 1 700 712 11.7 11.9
6 1503960366 2016-04-19 00:00:00 1 304 320 5.07 5.33
7 1503960366 2016-04-20 00:00:00 1 360 377 6 6.28
8 1503960366 2016-04-21 00:00:00 1 325 364 5.42 6.07
9 1503960366 2016-04-23 00:00:00 1 361 384 6.02 6.4
10 1503960366 2016-04-24 00:00:00 1 430 449 7.17 7.48
# ℹ 400 more rows
# ℹ 2 more variables: Date <date>, Time <chr>
# ℹ Use `print(n = ...)` to see more rows
sum(duplicated(IntensitiesHours))
[1] 0
sum(duplicated(IntensitiesHours))
[1] 0
sum(duplicated(IntensitiesDays_v3))
[1] 0
sum(duplicated(CaloriesHours))
[1] 0
sum(duplicated(ActivityDays_v3))
[1] 0
# Number of observations in each dataframe
nrow(StepsHours)
[1] 21165
nrow(SleepHours)
[1] 413
nrow(IntensitiesHours)
[1] 22099
nrow(IntensitiesDays)
[1] 940
nrow(CaloriesHours)
[1] 22099
nrow(ActivityDays)
[1] 940
# GRAPHIC 1
# Add the a column to indicate the day of the week
StepsHours$Day_of_Week <- weekdays(as.Date(StepsHours$Date))
head(StepsHours) # View the updated data frame
# Create a pivot table for StepTotal by Day_of_Week and Time
pivot_table <- aggregate(StepTotal ~ Day_of_Week + Time, data = StepsHours, FUN = sum)
# Create the visualization (heatmap)
ggplot(data = pivot_table, aes(x = Time, y = Day_of_Week, fill = StepTotal)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "blue") +
labs(title = "Steps Taken by Hour of the Day and Day of the Week",
x = "Hour of the Day", y = "Day of the Week") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Export the new data frame to Excel.
write_xlsx(StepsHours, "StepsHours_df.xlsx")
# GRAPHIC 2
# Convert Id column to factor with levels in the original order
StepsHours_df$Id <- factor(StepsHours_df$Id, levels = unique(StepsHours_df$Id))
# Create a summary data frame with the total steps per user
summary_steps <- StepsHours_df %>%
group_by(Id) %>%
summarise(total_steps = sum(StepTotal / 33)) %>%
mutate(activity_type = factor(case_when(
total_steps < 5000 ~ "Sedentary Active",
total_steps >= 5000 & total_steps < 7500 ~ "Lightly Active",
total_steps >= 7500 & total_steps < 10000 ~ "Moderately Active",
total_steps >= 10000 ~ "Very Active"), levels = c("Sedentary Active", "Lightly Active","Moderately Active","Very Active"))) %>%
arrange(-total_steps)
# Convert Id to a factor to ensure it's treated as categorical
summary_steps$Id <- factor(summary_steps$Id)
# Create the bar plot
steps_barplot <- ggplot(data = summary_steps, aes(x = Id, y = total_steps, fill = activity_type)) +
geom_bar(stat = "identity", color = "blue") +
geom_hline(yintercept = 5000, linetype = "dashed", color = "red", size = 0.5) +
labs(title = "Total Steps per User",
x = "User ID",
y = "Total Steps") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
theme(axis.text.y = element_text(size = 6)) +
scale_y_continuous(labels = scales::comma) +
annotate("text", x=11,y=13000,label="-Minimun daily steps: 5000", color = "red")
# Display the plot
print(steps_barplot)
Recent studies have revealed that a minimum of 5,000 steps per day and even less may be enough to see a health benefit. Taking these analyzes into account, only 33% of users have a sedentary activity, 30% show Lightly activity, that is, slightly more than 5000 steps per day, and 37% present Moderately Active and Very Active activity.
With this we can conclude that the majority of users present a healthy physical activity, although we cannot deduce with certainty that this is a reflection of the general population, since we do not know FitBit’s market niche, possibly the majority of FitBit users are physically active people.
# Add the a column to indicate the day of the week
SleepHours$Day_of_Week <- weekdays(as.Date(SleepHours$Date))
head(StepsHours) # View the updated data frame
# Create a vector of custom breaks for the Y-axis and define the levels for the "Day_of_Week" factor
custom_breaks <- c(5, 6, 7, 8, 9, 10)
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
# Convert "Day_of_Week" to a factor with the specified levels
SleepHours$Day_of_Week <- factor(SleepHours$Day_of_Week, levels = weekday_levels)
# Create the plot with custom Y-axis breaks, multiple fill colors, and black lines/points
ggplot(SleepHours, aes(x = Day_of_Week, y = TotalHoursAsleep, fill = factor(Day_of_Week))) +
geom_boxplot(color = "black") + # Specify the color of the boxplot outlines
geom_point(color = "black") + # Specify the color of the points
geom_line(color = "black", size = 0.1) + # Specify the color and size of the lines
labs(title = "Total Hours Asleep by Day of the Week",
x = "Day of the Week", y = "Total Hours Asleep") +
scale_y_continuous(breaks = custom_breaks) +
scale_fill_manual(values = c("brown2", "lightsalmon", "yellow1", "yellowgreen", "skyblue", "royalblue", "mediumpurple1")) +
theme_bw()
# Export the new data frame to Excel.
write_xlsx(SleepHours, "SleepHours_df.xlsx")
There is no defined linear correlation between hours of sleep and days of greatest physical activity. The days where users sleep the most hours are Saturdays and Sundays, but it may only be because it is the weekend.
Most users meet the minimum hours recommended by the Centers for Disease Control and Prevention (CDC), 7 hours a day per night for adults between 18 and 60 years old.
# Add a column to indicate the day of the week
IntensitiesHours$Day_of_Week <- weekdays(as.Date(IntensitiesHours$Date))
# Extract hour from the Time column
IntensitiesHours$Hour <- as.integer(substr(IntensitiesHours$Time, 1, 2))
# Summarize TotalIntensity per hour
IntensitiesHours <- IntensitiesHours %>%
group_by(Hour) %>%
mutate(TotalIntensityPerHour = sum(TotalIntensity)) %>%
mutate(AverageIntensityPerHour = mean(TotalIntensity))
# Create a heatmap of intensity by hour and day of the week
ggplot(IntensitiesHours, aes(x = factor(Hour), y = factor(Day_of_Week), fill = TotalIntensityPerHour)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Intensity Patterns by Hour and Day of the Week",
x = "Hour of the Day", y = "Day of the Week",
fill = "Total Intensity Per Hour") +
theme(axis.text.x = element_text(hjust = 1, size = 6)) +
theme(axis.text.y = element_text(size = 8))
# Export the new data frame to Excel.
write_xlsx(IntensitiesHours, "IntensitiesHours_df.xlsx")
# GRAPHIC 1
# Add a column to indicate the day of the week
IntensitiesDays$Day_of_Week <- weekdays(as.Date(IntensitiesDays$ActivityDay))
# Converting columns to numeric
IntensitiesDays <- IntensitiesDays %>%
mutate(across(c(SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, SedentaryActiveDistance, LightActiveDistance, LightActiveDistance, VeryActiveDistance), as.numeric))
# Create scatter plots with regression lines for each activity level
sedentary_plot <- ggplot(IntensitiesDays, aes(x = SedentaryMinutes, y = SedentaryActiveDistance)) +
geom_point(color = "cornflowerblue", alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "maroon") +
labs(title = "Sedentary Activity",
x = "Sedentary Minutes", y = "Sedentary Distance") +
facet_wrap(~Day_of_Week) +
theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
theme(axis.text.y = element_text(size = 5)) # Adjust y-axis text size here
lightly_active_plot <- ggplot(IntensitiesDays, aes(x = LightlyActiveMinutes, y = LightActiveDistance)) +
geom_point(color = "cornflowerblue", alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "yellow1") +
labs(title = "Lightly Active Activity",
x = "Lightly Active Minutes", y = "Light Active Distance") +
facet_wrap(~Day_of_Week) +
theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
theme(axis.text.y = element_text(size = 5)) # Adjust y-axis text size here
fairly_active_plot <- ggplot(IntensitiesDays, aes(x = FairlyActiveMinutes, y = ModeratelyActiveDistance)) +
geom_point(color = "cornflowerblue", alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "Fairly Active Activity",
x = "Fairly Active Minutes", y = "Moderately Active Distance") +
facet_wrap(~Day_of_Week) +
theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
theme(axis.text.y = element_text(size = 5)) # Adjust y-axis text size here
very_active_plot <- ggplot(IntensitiesDays, aes(x = VeryActiveMinutes, y = VeryActiveDistance)) +
geom_point(color = "cornflowerblue", alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "darkmagenta") +
labs(title = "Very Active Activity",
x = "Very Active Minutes", y = "Very Active Distance") +
facet_wrap(~Day_of_Week) +
theme(axis.text.x = element_text(hjust = 1, size = 5)) + # Adjust text size here
theme(axis.text.y = element_text(size = 5)) # Adjust y-axis text size here
# Arrange plots in a grid
grid.arrange(sedentary_plot, lightly_active_plot, fairly_active_plot, very_active_plot, nrow = 2)
# Export the new data frame to Excel.
write_xlsx(IntensitiesDays, "IntensitiesDays_df.xlsx")
There is a positive correlation between walking speed and distance traveled. When the activity is sedentary there is no linear correlation since time increases but distance is zero. When the activity is Lightly, as the walking time increases, the distance also increases. When the activity is Fairly and Very Active, the activity time is shorter and the distance traveled is also shorter compared to the Lightly intensity, but the positive correlation is still maintained.
The day which presented more intensity activity are: Lightly Active: Thursday, Saturday and Sunday. Fairly Active: Tuesday and Sunday; Very Active: Saturday and Sunday.
The majority of users have healthy physical activity, complying with the minimum daily activity recommended by The World Health Organization (WHO): adults get at least 21 minutes of moderate-intensity aerobic physical activity per day, or 2.5 hours per week. This can be broken down into 150–300 minutes of moderate-intensity activity, or 75–150 minutes of vigorous-intensity activity. The WHO also recommends an equivalent combination of moderate- and vigorous-intensity activity.
# GRAPHIC 2
# Define the order of levels for the "Activity" variable
activity_order <- c("SedentaryActiveDistance", "LightActiveDistance",
"ModeratelyActiveDistance", "VeryActiveDistance")
# Define the levels for the "Day_of_Week" factor in the desired order
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
# Convert "Day_of_Week" to a factor with the specified levels
IntensitiesDays$Day_of_Week <- factor(IntensitiesDays$Day_of_Week, levels = weekday_levels)
# Calculate mean values for each day of the week across all users
IntensitiesDays_mean <- IntensitiesDays %>%
group_by(Day_of_Week) %>%
summarise_at(vars(SedentaryActiveDistance, LightActiveDistance,
ModeratelyActiveDistance, VeryActiveDistance), mean, na.rm = TRUE)
# Reshape the data into long format for distance variables
IntensitiesDays_long_distance <- gather(IntensitiesDays_mean, key = "Activity", value =
"Distance", SedentaryActiveDistance, LightActiveDistance, ModeratelyActiveDistance, VeryActiveDistance)
# Convert "Activity" to a factor with specified order of levels
IntensitiesDays_long_distance$Activity <- factor(IntensitiesDays_long_distance$Activity,
levels = activity_order)
# Create the bar plot for distance
distance_plot <- ggplot(IntensitiesDays_long_distance, aes(x = Day_of_Week, y = Distance, fill = Activity)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Distance Levels by Day of Week",
x = "Activity Day", y = "Distance") +
scale_fill_manual(values = c("SedentaryActiveDistance" = "cyan3",
"LightActiveDistance" = "olivedrab3",
"ModeratelyActiveDistance" = "mediumorchid1",
"VeryActiveDistance" = "coral")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
theme(axis.text.y = element_text(size = 6))
# Combine both plots side by side
grid.arrange(activity_plot, distance_plot, ncol = 1)
# Add a column to indicate the day of the week
CaloriesHours$Day_of_Week <- weekdays(as.Date(CaloriesHours$Date))
# Define the levels for the "Day_of_Week" factor in the desired order
weekday_levels <- c("Sunday","Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
# Convert "Day_of_Week" to a factor with the specified levels
CaloriesHours$Day_of_Week <- factor(CaloriesHours$Day_of_Week, levels = weekday_levels)
# Create a heatmap
heatmap_plot <- ggplot(CaloriesHours, aes(x = Time, y = Day_of_Week, fill = Calories)) +
geom_tile() +
scale_fill_gradient(low = "blue", high = "red", name = "Calories") +
labs(title = "Calories Consumed per Hour and Day of Week",
x = "Hour of the Day", y = "Day of the Week") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
legend.position = "bottom") +
theme(axis.text.y = element_text(size = 7))
# Display the plot
print(heatmap_plot)
# Export the new data frame to Excel.
write_xlsx(CaloriesHours, "CaloriesHours_df.xlsx")
# Add a column to indicate the day of the week
ActivityDays$Day_of_Week <- weekdays(as.Date(ActivityDays$ActivityDate))
# Create a new column adding LightlyActiveMinutes, FairlyActiveMinutes, and VeryActiveMinutes
ActivityDays <- ActivityDays %>%
group_by(Id) %>%
mutate(TotalMinutes = LightlyActiveMinutes + FairlyActiveMinutes + VeryActiveMinutes,
AvgTotalMinutes = mean(TotalMinutes, na.rm = TRUE))
# Convert Id column to factor with levels in the original order
ActivityDays$Id <- factor(ActivityDays$Id, levels = unique(ActivityDays$Id))
# Create a bubble chart
ggplot(ActivityDays, aes(x = Day_of_Week, y = Calories, size = TotalMinutes, color = Id)) +
geom_point(alpha = 0.6) + # Adjust transparency for better visibility
geom_hline(yintercept = 1800, linetype = "dashed", color = "magenta3", size = 0.5) +
geom_hline(yintercept = 2800, linetype = "dashed", color = "blue3", size = 0.5) +
scale_size_continuous(range = c(0.3, 5)) + # Adjust the range of bubble sizes
labs(title = "Bubble Chart of Activity Data",
x = "Day of Week",
y = "Total Calories",
size = "Total Minutes",
color = "Users") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6)) +
theme(axis.text.y = element_text(size = 6)) +
scale_y_continuous(labels = scales::comma) +
annotate("text", x=3,y=600,label="-Minimun daily calories for women: 1,800", color = "magenta3") +
annotate("text", x=2.88,y=400,label="-Minimun daily calories for men: 2,000", color = "blue3")
# Export the new data frame to Excel.
write_xlsx(ActivityDays, "ActivityDays_df.xlsx")
Firstly, evaluate the most suitable process for collecting complete, accurate and demographically defined data, with the aim of being able to obtain useful conclusions for decision making.
Secondly, once the data has been analyzed and the findings and conclusions have been shared, a meeting must be held with the design and programming team to determine the feasibility and costs of implementing improvements in the functionalities and design of the Bellabeat product.
Thirdly, a meeting must be held with the marketing department to study the advertising and marketing strategy to be carried out so that the product is positioned as unique and the benefits of its use among women are made known.
Thank you!