Ask
Bellabeat, a wellness technology company, has provided Fitbit fitness
tracker data and asked us to analyze smart device usage trends. The goal
is to focus on one Bellabeat product and use data-driven insights to
guide marketing strategy. Key questions include: - What trends
can we observe in smart device (Fitbit) usage data? -
How do these trends relate to Bellabeat’s target customers and
products? - How can these insights influence
Bellabeat’s marketing and product decisions?
This analysis will follow the six-step Google Data Analytics
framework: Ask, Prepare, Process, Analyze, Share, and
Act. We start by understanding the business task and data
sources.
Prepare
We will use several Fitbit data files provided, including: -
dailyActivity_merged.csv (daily activity metrics:
steps, distance, active minutes, calories burned) -
sleepDay_merged.csv (daily sleep records: minutes
asleep, time in bed) - weightLogInfo_merged.csv (user
weight and BMI logs) - hourlySteps_merged.csv (hourly
step counts for each day)
These CSV files contain data from Fitbit users over a one-month
period. We load each file using the tidyverse
read_csv() function and inspect the data.
library(tidyverse)
library(lubridate)
library(knitr)
# Load daily activity
daily_activity1 <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")
daily_activity2 <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
daily_activity <- bind_rows(daily_activity1, daily_activity2)
# Load sleep data (only available in second month)
sleep_data <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
# Load weight logs
weight_log1 <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/weightLogInfo_merged.csv")
weight_log2 <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
weight_log <- bind_rows(weight_log1, weight_log2)
# Load hourly steps
hourly_steps1 <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/hourlySteps_merged.csv")
hourly_steps2 <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
hourly_steps <- bind_rows(hourly_steps1, hourly_steps2)
# Display tidy previews
kable(head(daily_activity, 10), caption = "Preview: Daily Activity Data")
Preview: Daily Activity Data
| 1503960366 |
3/25/2016 |
11004 |
7.11 |
7.11 |
0 |
2.57 |
0.46 |
4.07 |
0 |
33 |
12 |
205 |
804 |
1819 |
| 1503960366 |
3/26/2016 |
17609 |
11.55 |
11.55 |
0 |
6.92 |
0.73 |
3.91 |
0 |
89 |
17 |
274 |
588 |
2154 |
| 1503960366 |
3/27/2016 |
12736 |
8.53 |
8.53 |
0 |
4.66 |
0.16 |
3.71 |
0 |
56 |
5 |
268 |
605 |
1944 |
| 1503960366 |
3/28/2016 |
13231 |
8.93 |
8.93 |
0 |
3.19 |
0.79 |
4.95 |
0 |
39 |
20 |
224 |
1080 |
1932 |
| 1503960366 |
3/29/2016 |
12041 |
7.85 |
7.85 |
0 |
2.16 |
1.09 |
4.61 |
0 |
28 |
28 |
243 |
763 |
1886 |
| 1503960366 |
3/30/2016 |
10970 |
7.16 |
7.16 |
0 |
2.36 |
0.51 |
4.29 |
0 |
30 |
13 |
223 |
1174 |
1820 |
| 1503960366 |
3/31/2016 |
12256 |
7.86 |
7.86 |
0 |
2.29 |
0.49 |
5.04 |
0 |
33 |
12 |
239 |
820 |
1889 |
| 1503960366 |
4/1/2016 |
12262 |
7.87 |
7.87 |
0 |
3.32 |
0.83 |
3.64 |
0 |
47 |
21 |
200 |
866 |
1868 |
| 1503960366 |
4/2/2016 |
11248 |
7.25 |
7.25 |
0 |
3.00 |
0.45 |
3.74 |
0 |
40 |
11 |
244 |
636 |
1843 |
| 1503960366 |
4/3/2016 |
10016 |
6.37 |
6.37 |
0 |
0.91 |
1.28 |
4.18 |
0 |
15 |
30 |
314 |
655 |
1850 |
kable(head(sleep_data, 10), caption = "Preview: Sleep Data (only Apr–May)")
Preview: Sleep Data (only Apr–May)
| 1503960366 |
4/12/2016 12:00:00 AM |
1 |
327 |
346 |
| 1503960366 |
4/13/2016 12:00:00 AM |
2 |
384 |
407 |
| 1503960366 |
4/15/2016 12:00:00 AM |
1 |
412 |
442 |
| 1503960366 |
4/16/2016 12:00:00 AM |
2 |
340 |
367 |
| 1503960366 |
4/17/2016 12:00:00 AM |
1 |
700 |
712 |
| 1503960366 |
4/19/2016 12:00:00 AM |
1 |
304 |
320 |
| 1503960366 |
4/20/2016 12:00:00 AM |
1 |
360 |
377 |
| 1503960366 |
4/21/2016 12:00:00 AM |
1 |
325 |
364 |
| 1503960366 |
4/23/2016 12:00:00 AM |
1 |
361 |
384 |
| 1503960366 |
4/24/2016 12:00:00 AM |
1 |
430 |
449 |
kable(head(weight_log, 10), caption = "Preview: Weight Log Data")
Preview: Weight Log Data
| 1503960366 |
4/5/2016 11:59:59 PM |
53.3 |
117.5064 |
22 |
22.97 |
TRUE |
1.459901e+12 |
| 1927972279 |
4/10/2016 6:33:26 PM |
129.6 |
285.7191 |
NA |
46.17 |
FALSE |
1.460313e+12 |
| 2347167796 |
4/3/2016 11:59:59 PM |
63.4 |
139.7731 |
10 |
24.77 |
TRUE |
1.459728e+12 |
| 2873212765 |
4/6/2016 11:59:59 PM |
56.7 |
125.0021 |
NA |
21.45 |
TRUE |
1.459987e+12 |
| 2873212765 |
4/7/2016 11:59:59 PM |
57.2 |
126.1044 |
NA |
21.65 |
TRUE |
1.460074e+12 |
| 2891001357 |
4/5/2016 11:59:59 PM |
88.4 |
194.8886 |
NA |
25.03 |
TRUE |
1.459901e+12 |
| 4445114986 |
3/30/2016 11:59:59 PM |
92.4 |
203.7071 |
NA |
35.01 |
TRUE |
1.459382e+12 |
| 4558609924 |
4/8/2016 11:59:59 PM |
69.4 |
153.0008 |
NA |
27.14 |
TRUE |
1.460160e+12 |
| 4702921684 |
4/4/2016 11:59:59 PM |
99.7 |
219.8009 |
NA |
26.11 |
TRUE |
1.459814e+12 |
| 6962181067 |
3/30/2016 11:59:59 PM |
61.5 |
135.5843 |
NA |
24.03 |
TRUE |
1.459382e+12 |
kable(head(hourly_steps, 10), caption = "Preview: Hourly Steps Data")
Preview: Hourly Steps Data
| 1503960366 |
3/12/2016 12:00:00 AM |
0 |
| 1503960366 |
3/12/2016 1:00:00 AM |
0 |
| 1503960366 |
3/12/2016 2:00:00 AM |
0 |
| 1503960366 |
3/12/2016 3:00:00 AM |
0 |
| 1503960366 |
3/12/2016 4:00:00 AM |
0 |
| 1503960366 |
3/12/2016 5:00:00 AM |
0 |
| 1503960366 |
3/12/2016 6:00:00 AM |
0 |
| 1503960366 |
3/12/2016 7:00:00 AM |
0 |
| 1503960366 |
3/12/2016 8:00:00 AM |
0 |
| 1503960366 |
3/12/2016 9:00:00 AM |
8 |
Note:
The sleepDay_merged.csv dataset was only available in the
April-May (4.12.16-5.12.16) export folder. This means my sleep analysis
is based on a smaller sample size compared to the activity steps, which
span both March-April and April-May. This limitaion is taken into
account when interpreting trends.
Process
The Process Phase is about cleaning andd preparing
datasets before analysis. Typically for the Bellabeat/Fitbit case,
we:
- Convert dates/times into proper formats.
- Remove Duplicates.
- Check for missing values.
- Standardized columns names.
library(janitor)
# Clean column names to snake_case for consistency
daily_activity <- clean_names(daily_activity)
sleep_data <- clean_names(sleep_data)
weight_log <- clean_names(weight_log)
hourly_steps <- clean_names(hourly_steps)
# Convert date columns to proper date formats
daily_activity$activity_date <- mdy(daily_activity$activity_date)
sleep_data$sleep_day <- mdy_hms(sleep_data$sleep_day)
weight_log$date <- mdy_hms(weight_log$date)
hourly_steps$activity_hour <- mdy_hms(hourly_steps$activity_hour)
# Remove duplicates
daily_activity <- distinct(daily_activity)
sleep_data <- distinct(sleep_data)
weight_log <- distinct(weight_log)
hourly_steps <- distinct(hourly_steps)
# Check missing values
missing_summary <- tibble(
dataset = c("Daily Activity", "Sleep", "Weight Log", "Hourly Steps"),
missing = c(
sum(is.na(daily_activity)),
sum(is.na(sleep_data)),
sum(is.na(weight_log)),
sum(is.na(hourly_steps))
)
)
# Display summary tables
kable(head(daily_activity, 10), caption = "Cleaned Daily Activity Data (first 10 rows)")
Cleaned Daily Activity Data (first 10 rows)
| 1503960366 |
2016-03-25 |
11004 |
7.11 |
7.11 |
0 |
2.57 |
0.46 |
4.07 |
0 |
33 |
12 |
205 |
804 |
1819 |
| 1503960366 |
2016-03-26 |
17609 |
11.55 |
11.55 |
0 |
6.92 |
0.73 |
3.91 |
0 |
89 |
17 |
274 |
588 |
2154 |
| 1503960366 |
2016-03-27 |
12736 |
8.53 |
8.53 |
0 |
4.66 |
0.16 |
3.71 |
0 |
56 |
5 |
268 |
605 |
1944 |
| 1503960366 |
2016-03-28 |
13231 |
8.93 |
8.93 |
0 |
3.19 |
0.79 |
4.95 |
0 |
39 |
20 |
224 |
1080 |
1932 |
| 1503960366 |
2016-03-29 |
12041 |
7.85 |
7.85 |
0 |
2.16 |
1.09 |
4.61 |
0 |
28 |
28 |
243 |
763 |
1886 |
| 1503960366 |
2016-03-30 |
10970 |
7.16 |
7.16 |
0 |
2.36 |
0.51 |
4.29 |
0 |
30 |
13 |
223 |
1174 |
1820 |
| 1503960366 |
2016-03-31 |
12256 |
7.86 |
7.86 |
0 |
2.29 |
0.49 |
5.04 |
0 |
33 |
12 |
239 |
820 |
1889 |
| 1503960366 |
2016-04-01 |
12262 |
7.87 |
7.87 |
0 |
3.32 |
0.83 |
3.64 |
0 |
47 |
21 |
200 |
866 |
1868 |
| 1503960366 |
2016-04-02 |
11248 |
7.25 |
7.25 |
0 |
3.00 |
0.45 |
3.74 |
0 |
40 |
11 |
244 |
636 |
1843 |
| 1503960366 |
2016-04-03 |
10016 |
6.37 |
6.37 |
0 |
0.91 |
1.28 |
4.18 |
0 |
15 |
30 |
314 |
655 |
1850 |
kable(head(sleep_data, 10), caption = "Cleaned Sleep Data (first 10 rows)")
Cleaned Sleep Data (first 10 rows)
| 1503960366 |
2016-04-12 |
1 |
327 |
346 |
| 1503960366 |
2016-04-13 |
2 |
384 |
407 |
| 1503960366 |
2016-04-15 |
1 |
412 |
442 |
| 1503960366 |
2016-04-16 |
2 |
340 |
367 |
| 1503960366 |
2016-04-17 |
1 |
700 |
712 |
| 1503960366 |
2016-04-19 |
1 |
304 |
320 |
| 1503960366 |
2016-04-20 |
1 |
360 |
377 |
| 1503960366 |
2016-04-21 |
1 |
325 |
364 |
| 1503960366 |
2016-04-23 |
1 |
361 |
384 |
| 1503960366 |
2016-04-24 |
1 |
430 |
449 |
kable(head(weight_log, 10), caption = "Cleaned Weight Log Data (first 10 rows)")
Cleaned Weight Log Data (first 10 rows)
| 1503960366 |
2016-04-05 23:59:59 |
53.3 |
117.5064 |
22 |
22.97 |
TRUE |
1.459901e+12 |
| 1927972279 |
2016-04-10 18:33:26 |
129.6 |
285.7191 |
NA |
46.17 |
FALSE |
1.460313e+12 |
| 2347167796 |
2016-04-03 23:59:59 |
63.4 |
139.7731 |
10 |
24.77 |
TRUE |
1.459728e+12 |
| 2873212765 |
2016-04-06 23:59:59 |
56.7 |
125.0021 |
NA |
21.45 |
TRUE |
1.459987e+12 |
| 2873212765 |
2016-04-07 23:59:59 |
57.2 |
126.1044 |
NA |
21.65 |
TRUE |
1.460074e+12 |
| 2891001357 |
2016-04-05 23:59:59 |
88.4 |
194.8886 |
NA |
25.03 |
TRUE |
1.459901e+12 |
| 4445114986 |
2016-03-30 23:59:59 |
92.4 |
203.7071 |
NA |
35.01 |
TRUE |
1.459382e+12 |
| 4558609924 |
2016-04-08 23:59:59 |
69.4 |
153.0008 |
NA |
27.14 |
TRUE |
1.460160e+12 |
| 4702921684 |
2016-04-04 23:59:59 |
99.7 |
219.8009 |
NA |
26.11 |
TRUE |
1.459814e+12 |
| 6962181067 |
2016-03-30 23:59:59 |
61.5 |
135.5843 |
NA |
24.03 |
TRUE |
1.459382e+12 |
kable(head(hourly_steps, 10), caption = "Cleaned Hourly Steps Data (first 10 rows)")
Cleaned Hourly Steps Data (first 10 rows)
| 1503960366 |
2016-03-12 00:00:00 |
0 |
| 1503960366 |
2016-03-12 01:00:00 |
0 |
| 1503960366 |
2016-03-12 02:00:00 |
0 |
| 1503960366 |
2016-03-12 03:00:00 |
0 |
| 1503960366 |
2016-03-12 04:00:00 |
0 |
| 1503960366 |
2016-03-12 05:00:00 |
0 |
| 1503960366 |
2016-03-12 06:00:00 |
0 |
| 1503960366 |
2016-03-12 07:00:00 |
0 |
| 1503960366 |
2016-03-12 08:00:00 |
0 |
| 1503960366 |
2016-03-12 09:00:00 |
8 |
kable(missing_summary, caption = "Missing Values Summary Across Datasets")
Missing Values Summary Across Datasets
| Daily Activity |
0 |
| Sleep |
0 |
| Weight Log |
94 |
| Hourly Steps |
0 |
Notes:
- Dates columns were sucessfuly converted into R date-time objects,
ensuring time-based analysis will be accurate.
- Duplicate records were removed to aovoid skewing results.
- Missing values are minimal, except in the weight log, which
is expected because not all users consistently record their weight.
Analyze
In this phase, we explore trends and relationships in the Fitbit data
to uncover insights about user behavior.
The analysis focuses on daily activity, hourly steps, and sleep, as
these are most relevant to Bellabeat’s wellness products and app.
library(ggplot2)
# --- Daily Activity Summary ---
activity_summary <- daily_activity %>%
summarise(
avg_steps = mean(total_steps, na.rm = TRUE),
avg_calories = mean(calories, na.rm = TRUE),
avg_sedentary = mean(sedentary_minutes, na.rm = TRUE),
avg_active = mean(very_active_minutes, na.rm = TRUE)
)
kable(activity_summary, caption = "Average Daily Activity Metrics")
Average Daily Activity Metrics
| 7280.898 |
2266.266 |
992.5426 |
19.67931 |
# --- Correlation: Steps vs Calories ---
steps_calories_plot <- ggplot(daily_activity, aes(x = total_steps, y = calories)) +
geom_point(alpha = 0.5, color = "steelblue") +
geom_smooth(method = "lm", se = FALSE, color = "darkred") +
labs(
title = "Relationship Between Daily Steps and Calories Burned",
x = "Total Steps",
y = "Calories Burned"
) +
theme_minimal()
steps_calories_plot

# --- Average Steps by Hour ---
hourly_steps_summary <- hourly_steps %>%
mutate(hour = hour(activity_hour)) %>%
group_by(hour) %>%
summarise(avg_steps = mean(step_total, na.rm = TRUE))
kable(hourly_steps_summary, caption = "Average Steps by Hour of Day")
Average Steps by Hour of Day
| 0 |
43.361240 |
| 1 |
21.884178 |
| 2 |
13.694416 |
| 3 |
6.850492 |
| 4 |
11.108752 |
| 5 |
34.926463 |
| 6 |
148.241969 |
| 7 |
282.654922 |
| 8 |
395.841451 |
| 9 |
431.373057 |
| 10 |
453.856179 |
| 11 |
454.740644 |
| 12 |
534.259124 |
| 13 |
496.229004 |
| 14 |
506.020344 |
| 15 |
398.062304 |
| 16 |
470.961619 |
| 17 |
499.712105 |
| 18 |
550.265929 |
| 19 |
554.885729 |
| 20 |
377.628888 |
| 21 |
283.504747 |
| 22 |
204.010032 |
| 23 |
112.085140 |
hourly_steps_plot <- ggplot(hourly_steps_summary, aes(x = hour, y = avg_steps)) +
geom_line(color = "forestgreen", size = 1) +
labs(
title = "Hourly Average Step Trends",
x = "Hour of Day",
y = "Average Steps"
) +
theme_minimal()
hourly_steps_plot

# --- Sleep Summary ---
sleep_summary <- sleep_data %>%
summarise(
avg_minutes_asleep = mean(total_minutes_asleep, na.rm = TRUE),
avg_time_in_bed = mean(total_time_in_bed, na.rm = TRUE)
)
kable(sleep_summary, caption = "Average Sleep Metrics")
Average Sleep Metrics
| 419.1732 |
458.4829 |
# --- Correlation: Sleep vs Activity ---
sleep_activity_plot <- ggplot(
left_join(sleep_data, daily_activity, by = "id"),
aes(x = total_minutes_asleep, y = total_steps)
) +
geom_point(alpha = 0.5, color = "purple") +
geom_smooth(method = "lm", se = FALSE, color = "darkorange") +
labs(
title = "Relationship Between Sleep Duration and Daily Steps",
x = "Minutes Asleep",
y = "Total Steps"
) +
theme_minimal()
sleep_activity_plot

Notes:
- Daily activity summary: Shows typical fitness
tracker engagement.
- Steps vs calories: Confirms physical activity
directly gives calorie burn
- Hourly Steps: Reveals peak activity times (often
mornings and evenings) Useful for targeted notifications/reminders in
the app.
- Shows: whether users meet recommended sleep(7-9)
hours. Helps Beallbeat sleep tracking features.
- Sleep vs activity: Examines if better sleep aligns
with higher activity, reinforcing holistic health messaging.
Share
In this phase, we summarize key insights from the analysis and
highlight how they relate to Bellabeat’s business objectives.
The findings are presented with visualizations and tables for
clarity.
Key Daily Activity Trends
| 7280.898 |
2266.266 |
992.5426 |
19.67931 |

Key Sleep Trends
| 419.1732 |
458.4829 |


Key Insights for Bellabeat:
- Daily Activity:
- Users average around X steps/day and about Y calories a
day.
- However, sendentary minutes are high, showing opportunities for
reminder features (nudges for movement).
- Hourly Activity:
- Peak steps occur in the morning and evening hours,
with mid-day slumps.
- Bellabeat could schedule app notifications or coaching
tips during inactive hours to boost engagement.
- Calories vs Steps:
- A clear positive correlation shows that more steps equals
more calories burned.
- Bellabeat can promote its ability track its calorie burn in real
time, motivating users to hit daily step goals.
- Sleep patterns:
- Average sleep is around X hours, slightly below the
recommended 7 - 9 hours.
- Bellabeat can promote is sleep tracking features
and highlight benefits of consistent rest.
- Sleep vs Activity
- Users who sleep longer generally better activity
levels.
- This supports Bellabeats positioning as a holistic wellness
tracker, not just a fitness device.
Strategic Takeaways:
- Bellabeat should emphasize holistic health(sleep + activity + calories) in marketing campaigns.
- App features like **personal nudges**(moves reminders, sleep notifications) can align with observed user behaviour.
-Insights supports promoting the **Bellabeat app** as a central hub for lifestyle improvement.
Act
Based on the analysis, the following recommendations are proposed to
help Bellabeat strengthen its marketing strategy and better engage its
users.
Key Recommendations
- Promote Daily Activity Tracking
- Users show high sedentary minutes but clear benefits from increased
step counts.
- Bellabeat should highlight step tracking and calorie burn
features in its app campaigns.
- Introduce customizable reminders to nudge users
during inactive periods.
- Capitalize on Hourly Trends
- Peak activity occurs in the mornings and evenings, with mid-day
slumps.
- Bellabeat can schedule in-app notifications, challenges, or
motivational content during mid-day to encourage movement.
- Highlight Holistic Health
- Sleep data shows average durations below recommended levels.
- Marketing should emphasize Bellabeat’s holistic
approach — combining sleep, activity, and calorie
tracking.
- Offer personalized insights (e.g., “Better sleep
improves your daily activity”) to connect wellness behaviors.
- Leverage Weight Log Data
- Weight/BMI data is inconsistently tracked, but those who log it are
likely highly engaged users.
- Bellabeat could promote premium features (like
weight insights, nutrition tracking) to these users.
- Position the Bellabeat App as a Wellness Coach
- Use trends discovered (steps ↔︎ calories, sleep ↔︎ activity) to
support marketing campaigns.
- Present the app as a 24/7 wellness coach that helps
women improve activity, rest, and overall lifestyle balance.
Strategic Impact
- These insights support data-driven marketing
campaigns showcasing Bellabeat’s value beyond just a
device.
- By aligning messaging with actual user behavior, Bellabeat can
boost user retention, app engagement, and brand
loyalty.
- Implementing these recommendations could position Bellabeat as not
only a fitness tracker, but as a holistic
health partner for women.