I’m a junior data analyst on the marketing team at Bellabeat, a company that makes health-focused smart devices for women. I’ve been asked to analyze data from one of our products to understand how people are using it. The insights I find will help shape our marketing strategy and support the company’s growth.
🔹 Characters Urška Sršen Cofounder & Chief Creative Officer of Bellabeat.
Sando Mur Cofounder, mathematician, and key member of Bellabeat’s executive team.
Bellabeat Marketing Analytics Team A group of data analysts who collect, analyze, and report data to support Bellabeat’s marketing strategy. You joined this team six months ago as a Junior Data Analyst, learning about the company’s mission and how to contribute to its success.
🔹 Products Bellabeat App Tracks and displays user data related to activity, sleep, stress, menstrual cycle, and mindfulness. Syncs with all Bellabeat smart wellness products.
Leaf A stylish wellness tracker (wearable as a bracelet, necklace, or clip). Tracks activity, sleep, and stress; syncs with the Bellabeat app.
Time A classic-looking wellness watch with smart features. Tracks activity, sleep, and stress; syncs with the Bellabeat app.
Spring A smart water bottle that tracks daily water intake. Helps users stay hydrated; syncs with the Bellabeat app.
Bellabeat Membership A subscription service offering 24/7 personalized guidance on:
Nutrition
Activity
Sleep
Health & Beauty
Mindfulness Tailored to users’ lifestyles and wellness goals.
Sršen asks you to analyze sma device usage data in order to gain insight into how consumers use non-Bellabeat sma devices. She then wants you to select one Bellabeat product to apply these insights to in your presentation. These questions will guide your analysis: 1. What are some trends in sma device usage? 2. How could these trends apply to Bellabeat customers? 3. How could these trends help in uence Bellabeat marketing strategy?
As part of the Bellabeat marketing analytics team, I was encouraged by Urška Sršen to explore public data on smart device users’ habits. She pointed me to the Fitbit Fitness Tracker Data (CC0 Public Domain via Mobius on Kaggle).
This dataset includes minute-level data from 30 Fitbit users, covering daily activity, steps, heart rate, and sleep. It offers useful insights into user behavior, which can help guide Bellabeat’s product and marketing strategies.
I started preparing the data for analysis by first downloading it and then uploading it into R Studio to begin the cleaning process. Combine all the CSV sheets into one DataFrame and begin the cleaning process.
library(tools)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.4 ✔ tidyr 1.3.1
## ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
# Set working directory to the folder where your CSV files are located
setwd("C:/Users/oland/Documents/case_study/Clean")
# List all CSV files in the directory
csv_files <- list.files(pattern = "\\.csv$", full.names = TRUE)
# Loop through each file and load it into its own DataFrame
for (file in csv_files) {
# Create variable name from the file name (without .csv extension)
df_name <- file_path_sans_ext(basename(file))
# Read the CSV and assign it to the environment
assign(df_name, read.csv(file, stringsAsFactors = FALSE))
}
Reviewed the daily activity data to identify trends and explore the correlation between total steps and day of the week.
# Step 1: Convert ActivityDate column to Date type
date_activity$ActivityDate <- as.Date(date_activity$ActivityDate, format = "%m/%d/%Y")
# Step 2: Extract the weekday name and store it in a new 'day' column
date_activity$day <- format(date_activity$ActivityDate, "%A")
# Step 3: Check the structure of ActivityDate
str(date_activity$ActivityDate)
## Date[1:457], format: "2016-03-25" "2016-03-26" "2016-03-27" "2016-03-28" "2016-03-29" ...
# Step 4: Convert 'day' to an ordered factor (for proper weekday ordering)
date_activity$day <- factor(date_activity$day,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"))
library(ggplot2)
# for comma formatting
ggplot(date_activity, aes(x = day, y = TotalSteps)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Total Steps per Day of the Week",
x = "Day of the Week",
y = "Total Steps") +
scale_y_continuous(labels = comma) + # Add commas for readability
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Reviewed the daily activity data to identify trends and explore the correlation between hours of the day and Total Steps.
Steps$ActivityHour <- ifelse(
grepl("^[0-9]{2}:[0-9]{2}$", Steps$ActivityHour),
paste0(Steps$ActivityHour, ":00"),
Steps$ActivityHour)
Steps$Hour <- mdy_hms(Steps$ActivityHour)
Steps <- Steps %>%
mutate(Hour = as.character(Hour)) %>%
separate(Hour, into = c("date", "hour"), sep = " ")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1021 rows [1, 25, 49, 73,
## 97, 121, 145, 169, 193, 217, 241, 265, 289, 313, 337, 361, 385, 409, 433, 457,
## ...].
Steps <- Steps %>%
mutate(hour = ifelse(is.na(hour), "24:00:00", hour))
ggplot(Steps, aes(x = hour, y = StepTotal)) +
geom_col(fill = "green", color = "green") + # Use 'geom_col' for summarized data
labs(title = "Total Steps vs Hour", x = "Hour of the Day", y = "Total Steps") +
scale_y_continuous(labels = label_comma()) + # Format y-axis with commas (prevents scientific notation)
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Analyzing the data to determine which user group utilizes the app most frequently.
# Summarize activity data
activity_totals <- data.frame(
category = c("Very Active", "Fairly Active", "Lightly Active", "Sedentary"),
minutes = colSums(date_activity[, c("VeryActiveMinutes", "FairlyActiveMinutes", "LightlyActiveMinutes", "SedentaryMinutes")])
)
# Calculate fraction and percent labels
activity_totals$fraction <- activity_totals$minutes / sum(activity_totals$minutes)
activity_totals$percent_label <- paste0(" (", percent(activity_totals$fraction), ")")
# percent for the legend
activity_totals$category_label <- paste0(activity_totals$category, activity_totals$percent_label)
# pie chart
ggplot(activity_totals, aes(x = "", y = fraction, fill = category_label)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) +
labs(title = "Activity Distribution", fill = "Activity Type") +
theme_void() +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0.5)
)
High engagement with passive tracking: Users spend significantly more time in sedentary and lightly active states, suggesting they rely on smart devices primarily for passive health monitoring.
Activity levels peak around 12 PM and 8 PM: Very Active Minutes show noticeable spikes at these times, suggesting users are most physically engaged during midday breaks and in the evening..
Users take more steps on Mondays and Sundays: This suggests motivation is high at the start of the week and users may also be more active on weekends.
The analysis shows that Bellabeat customers are most physically active around midday (12 PM) and evening (8 PM), with a noticeable difference in step counts between Saturday and Sunday—Sunday showing higher activity. This suggests that users may use Sunday as a reset or prep day before the week begins, engaging in more movement or wellness-related routines. Bellabeat can leverage these insights by aligning in-app features, like guided workouts or self-care checklists, with this weekend behavior. The consistent engagement with passive tracking also suggests that users value effortless, integrated wellness tools they don’t have to think twice about.
These patterns offer a strategic opportunity for Bellabeat to tailor its marketing and engagement tactics. With higher activity on Sundays, Bellabeat could introduce themes like “Self-Care Sunday” to encourage reflection, mindfulness, and movement before the new week. Targeted campaigns or app notifications scheduled around 12 PM and 8 PM—times when users are most active—can improve user interaction. Marketing should also highlight the brand’s ability to seamlessly support wellness through passive features like stress tracking, step counts, and sleep monitoring. These trends support a strategy focused on routine-based engagement, weekend wellness, and user-aligned timing.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.