Bellabeat is a high-tech manufacturer of health-focused products designed specifically for women. Although Bellabeat is currently a successful small company, it has significant potential to grow into a major player in the global smart device market. This analysis explores smart device usage patterns to uncover data-driven insights that can inform Bellabeat’s marketing strategy and support future growth.
To gain insight into how consumers use their smart device, identify new growth opportunities for the company and make recommendations on the Bellabeat marketing strategy improvement based on trends.
For this analysis we are using Fitbit Fitness Tracker <[Data] (https://www.kaggle.com/datasets/arashnic/fitbit/data)>
activity <- readr::read_csv("data/dailyActivity_merged.csv", show_col_types = FALSE)
intensity <- readr::read_csv("data/hourlyIntensities_merged.csv", show_col_types = FALSE)
calories <- readr::read_csv("data/hourlyCalories_merged.csv", show_col_types = FALSE)
sleep <- readr::read_csv("data/sleepDay_merged.csv", show_col_types = FALSE)
weight <- readr::read_csv("data/weightLogInfo_merged.csv", show_col_types = FALSE)
I verified the imported data by inspecting the first few rows and structure using head() and glimpse() functions.
knitr::kable(head(activity, 6))
| Id | ActivityDate | TotalSteps | TotalDistance | TrackerDistance | LoggedActivitiesDistance | VeryActiveDistance | ModeratelyActiveDistance | LightActiveDistance | SedentaryActiveDistance | VeryActiveMinutes | FairlyActiveMinutes | LightlyActiveMinutes | SedentaryMinutes | Calories |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1503960366 | 4/12/2016 | 13162 | 8.50 | 8.50 | 0 | 1.88 | 0.55 | 6.06 | 0 | 25 | 13 | 328 | 728 | 1985 |
| 1503960366 | 4/13/2016 | 10735 | 6.97 | 6.97 | 0 | 1.57 | 0.69 | 4.71 | 0 | 21 | 19 | 217 | 776 | 1797 |
| 1503960366 | 4/14/2016 | 10460 | 6.74 | 6.74 | 0 | 2.44 | 0.40 | 3.91 | 0 | 30 | 11 | 181 | 1218 | 1776 |
| 1503960366 | 4/15/2016 | 9762 | 6.28 | 6.28 | 0 | 2.14 | 1.26 | 2.83 | 0 | 29 | 34 | 209 | 726 | 1745 |
| 1503960366 | 4/16/2016 | 12669 | 8.16 | 8.16 | 0 | 2.71 | 0.41 | 5.04 | 0 | 36 | 10 | 221 | 773 | 1863 |
| 1503960366 | 4/17/2016 | 9705 | 6.48 | 6.48 | 0 | 3.19 | 0.78 | 2.51 | 0 | 38 | 20 | 164 | 539 | 1728 |
I discovered that the date and time stamp was formatted as a
character. So before analysis, I converted it to date-time
format and split it to date and time.
Date and time variables were standardized using
lubridate to enable consistent daily and hourly analysis
across datasets.
#activity
activity <- activity %>%
mutate(date = mdy(ActivityDate))
#intensity
intensity <- intensity %>%
mutate(
ActivityHour = mdy_hms(ActivityHour),
date = as.Date(ActivityHour),
time = format(ActivityHour, "%H:%M:%S")
)
#calories
calories <- calories %>%
mutate(
ActivityHour = mdy_hms(ActivityHour),
date = as.Date(ActivityHour),
time = format(ActivityHour, "%H:%M:%S")
)
#sleep
sleep <- sleep %>%
mutate(
SleepDay = mdy_hms(SleepDay),
date = as.Date(SleepDay)
)
#weight
weight <- weight %>%
mutate(
Date = mdy_hms(Date),
date = as.Date(Date)
)
Having confirmed that the datasets were properly formatted, exploratory analysis was conducted.
Initial exploratory analysis was conducted to understand user participation, activity patterns, and data completeness across the datasets.
tibble(
dataset = c("Activity", "Calories", "Intensity", "Sleep", "Weight"),
participants = c(
n_distinct(activity$Id),
n_distinct(calories$Id),
n_distinct(intensity$Id),
n_distinct(sleep$Id),
n_distinct(weight$Id)
)
) %>%
knitr::kable()
| dataset | participants |
|---|---|
| Activity | 33 |
| Calories | 33 |
| Intensity | 33 |
| Sleep | 24 |
| Weight | 8 |
The activity, calories, and intensity datasets each contain records from 33 unique users, indicating consistent coverage across daily and hourly activity tracking. The sleep dataset includes 24 users, while the weight dataset contains data from only 8 users. Due to the limited sample size, the weight dataset was excluded from further analysis to avoid unreliable conclusions.
Let’s have a look at summary statistics of the data sets:
# activity
activity %>%
select(TotalSteps, TotalDistance, SedentaryMinutes, Calories) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes Calories
## Min. : 0 Min. : 0.000 Min. : 0.0 Min. : 0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8 1st Qu.:1828
## Median : 7406 Median : 5.245 Median :1057.5 Median :2134
## Mean : 7638 Mean : 5.490 Mean : 991.2 Mean :2304
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5 3rd Qu.:2793
## Max. :36019 Max. :28.030 Max. :1440.0 Max. :4900
# explore num of active minutes per category
activity %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
summary()
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0
## Median : 4.00 Median : 6.00 Median :199.0
## Mean : 21.16 Mean : 13.56 Mean :192.8
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0
## Max. :210.00 Max. :143.00 Max. :518.0
# calories
summary(calories$Calories)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 42.00 63.00 83.00 97.39 108.00 948.00
# sleep
sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
## Median :1.000 Median :433.0 Median :463.0
## Mean :1.119 Mean :419.5 Mean :458.6
## 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.000 Max. :796.0 Max. :961.0
# weight
weight %>%
select(WeightKg, BMI) %>%
summary()
## WeightKg BMI
## Min. : 52.60 Min. :21.45
## 1st Qu.: 61.40 1st Qu.:23.96
## Median : 62.50 Median :24.39
## Mean : 72.04 Mean :25.19
## 3rd Qu.: 85.05 3rd Qu.:25.56
## Max. :133.50 Max. :47.54
Summary statistics revealed wide variation in daily steps, calories burned, and sedentary time, suggesting differing activity levels among users.
Average sedentary time is 991 minutes (about 16 hours), meaning a large proportion of daily time was spent in sedentary or lightly active states, while very active minutes were comparatively low.
According to guidance aligned with CDC physical activity recommendations, health benefits for adults under 60 are commonly observed at approximately 8,000–10,000 steps per day. In this dataset, the average daily step count is 7,638 steps, which falls slightly below this range, indicating potential opportunities to encourage increased daily movement.
Hourly intensity data was aggregated to daily values to align with the daily activity dataset and enable meaningful comparisons across datasets.
daily_intensity <- intensity %>%
group_by(Id, date) %>%
summarise(
mean_intensity = mean(TotalIntensity, na.rm = TRUE),
max_intensity = max(TotalIntensity, na.rm = TRUE),
total_intensity = sum(TotalIntensity, na.rm = TRUE),
.groups = "drop"
)
I joined activity and daily intensity by Id and date
daily_activity <- activity %>%
left_join(daily_intensity, by = c("Id", "date"))
I also joined activity dataset with sleep dataset using Id and date
activity_sleep <- activity %>%
inner_join(sleep, by = c("Id", "date"))
n_distinct(activity_sleep$Id)
## [1] 24
daily_activity_clean <- daily_activity %>%
filter(
TotalSteps > 0,
Calories > 0,
SedentaryMinutes < 1440
)
# Distribution of Daily Steps
ggplot(daily_activity_clean, aes(TotalSteps)) +
geom_histogram(bins = 30, fill = "steelblue", colour = "black") +
labs(
title = "Distribution of Daily Steps",
x = "Total Steps",
y = "Number of Days"
) +
theme_minimal()
This chart shows a wide spread in daily step counts across users, indicating substantial variation in activity levels. While some users are highly active, many record relatively low step counts on most days.
ggplot(daily_activity_clean, aes(TotalSteps, Calories)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "Relationship Between Steps and Calories Burned",
x = "Total Steps",
y = "Calories"
) +
theme_minimal()
A clear positive relationship exists between total daily steps and calories burned. Users who take more steps consistently expend more energy, reinforcing steps as an effective indicator of daily activity.
daily_activity_clean %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
pivot_longer(everything(), names_to = "ActivityType", values_to = "Minutes") %>%
ggplot(aes(ActivityType, Minutes)) +
geom_boxplot() +
labs(
title = "Distribution of Active Minutes by Intensity Level",
x = "Activity Type",
y = "Minutes"
) +
theme_minimal()
Lightly active minutes dominate daily activity, while time spent in higher-intensity activity is comparatively low. This suggests opportunities to encourage short bursts of moderate-to-vigorous activity.
intensity %>%
mutate(hour = hour(ActivityHour)) %>%
group_by(hour) %>%
summarise(avg_intensity = mean(TotalIntensity, na.rm = TRUE)) %>%
ggplot(aes(hour, avg_intensity)) +
geom_line() +
labs(
title = "Average Activity Intensity by Hour of Day",
x = "Hour of Day",
y = "Average Intensity"
)
Activity intensity follows a daily pattern, peaking during daytime hours and declining at night. This reflects typical daily routines and highlights optimal periods for activity-based engagement or reminders
The distribution of daily steps shows that while some users are highly active, many record relatively low step counts on most days. This indicates uneven engagement with physical activity and potential opportunities to encourage more consistent movement.
There is a clear positive relationship between total steps and calories burned. Users who take more steps consistently burn more calories, reinforcing the value of step-based activity as a simple and effective health metric.
Lightly active and sedentary minutes dominate daily activity patterns, while very active minutes are comparatively limited. This suggests that many users engage in movement but may struggle to reach higher-intensity activity levels.
Average activity intensity peaks during daytime hours and declines significantly at night. This reflects typical daily routines and highlights opportunities for timely nudges during peak activity windows.
Sleep duration varies considerably among users who track sleep. While this dataset is smaller, it provides useful context for understanding how activity and rest may interact for engaged users.
Since steps strongly correlate with calorie burn, Bellabeat can emphasize personalized daily step goals that adapt to user activity history rather than fixed targets. This lowers the barrier for less active users while still motivating improvement.
Given the dominance of light activity, Bellabeat can introduce micro-workout prompts (5–10 minutes) that encourage users to transition from light to moderate or vigorous activity, especially during peak daytime hours.
Activity intensity patterns suggest optimal times for engagement. Bellabeat can send context-aware reminders during periods when users are most likely to be active, increasing the effectiveness of in-app nudges.
For users who track sleep, Bellabeat can tailor activity suggestions based on sleep duration, promoting recovery-aware fitness guidance and reinforcing Bellabeat’s holistic health positioning.
The limited use of weight and sleep tracking suggests an opportunity to educate users on the benefits of logging these metrics. Clear messaging on how these features enhance personalized insights could increase adoption.
These insights highlight opportunities for Bellabeat to increase user engagement by promoting achievable activity goals, timely interventions, and personalized health guidance aligned with real-world user behavior.
Thank you for your interest in my Bellabeat Case Study!
This project represents my first end-to-end case study using R for data cleaning, analysis, and visualization. I welcome feedback and suggestions for further improvement.