Ask

Bellabeat, a wellness technology company, has provided Fitbit fitness tracker data and asked us to analyze smart device usage trends. The goal is to focus on one Bellabeat product and use data-driven insights to guide marketing strategy. Key questions include: - What trends can we observe in smart device (Fitbit) usage data? - How do these trends relate to Bellabeat’s target customers and products? - How can these insights influence Bellabeat’s marketing and product decisions?

This analysis will follow the six-step Google Data Analytics framework: Ask, Prepare, Process, Analyze, Share, and Act. We start by understanding the business task and data sources.

Prepare

We will use several Fitbit data files provided, including: - dailyActivity_merged.csv (daily activity metrics: steps, distance, active minutes, calories burned) - sleepDay_merged.csv (daily sleep records: minutes asleep, time in bed) - weightLogInfo_merged.csv (user weight and BMI logs) - hourlySteps_merged.csv (hourly step counts for each day)

These CSV files contain data from Fitbit users over a one-month period. We load each file using the tidyverse read_csv() function and inspect the data.

library(tidyverse)
library(lubridate)
library(knitr)

# Load daily activity
daily_activity1 <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")
daily_activity2 <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
daily_activity  <- bind_rows(daily_activity1, daily_activity2)

# Load sleep data (only available in second month)
sleep_data <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")

# Load weight logs
weight_log1 <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/weightLogInfo_merged.csv")
weight_log2 <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
weight_log  <- bind_rows(weight_log1, weight_log2)

# Load hourly steps
hourly_steps1 <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/hourlySteps_merged.csv")
hourly_steps2 <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
hourly_steps  <- bind_rows(hourly_steps1, hourly_steps2)

# Display tidy previews
kable(head(daily_activity, 10), caption = "Preview: Daily Activity Data")
Preview: Daily Activity Data
Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
1503960366 3/25/2016 11004 7.11 7.11 0 2.57 0.46 4.07 0 33 12 205 804 1819
1503960366 3/26/2016 17609 11.55 11.55 0 6.92 0.73 3.91 0 89 17 274 588 2154
1503960366 3/27/2016 12736 8.53 8.53 0 4.66 0.16 3.71 0 56 5 268 605 1944
1503960366 3/28/2016 13231 8.93 8.93 0 3.19 0.79 4.95 0 39 20 224 1080 1932
1503960366 3/29/2016 12041 7.85 7.85 0 2.16 1.09 4.61 0 28 28 243 763 1886
1503960366 3/30/2016 10970 7.16 7.16 0 2.36 0.51 4.29 0 30 13 223 1174 1820
1503960366 3/31/2016 12256 7.86 7.86 0 2.29 0.49 5.04 0 33 12 239 820 1889
1503960366 4/1/2016 12262 7.87 7.87 0 3.32 0.83 3.64 0 47 21 200 866 1868
1503960366 4/2/2016 11248 7.25 7.25 0 3.00 0.45 3.74 0 40 11 244 636 1843
1503960366 4/3/2016 10016 6.37 6.37 0 0.91 1.28 4.18 0 15 30 314 655 1850
kable(head(sleep_data, 10), caption = "Preview: Sleep Data (only Apr–May)")
Preview: Sleep Data (only Apr–May)
Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
1503960366 4/12/2016 12:00:00 AM 1 327 346
1503960366 4/13/2016 12:00:00 AM 2 384 407
1503960366 4/15/2016 12:00:00 AM 1 412 442
1503960366 4/16/2016 12:00:00 AM 2 340 367
1503960366 4/17/2016 12:00:00 AM 1 700 712
1503960366 4/19/2016 12:00:00 AM 1 304 320
1503960366 4/20/2016 12:00:00 AM 1 360 377
1503960366 4/21/2016 12:00:00 AM 1 325 364
1503960366 4/23/2016 12:00:00 AM 1 361 384
1503960366 4/24/2016 12:00:00 AM 1 430 449
kable(head(weight_log, 10), caption = "Preview: Weight Log Data")
Preview: Weight Log Data
Id Date WeightKg WeightPounds Fat BMI IsManualReport LogId
1503960366 4/5/2016 11:59:59 PM 53.3 117.5064 22 22.97 TRUE 1.459901e+12
1927972279 4/10/2016 6:33:26 PM 129.6 285.7191 NA 46.17 FALSE 1.460313e+12
2347167796 4/3/2016 11:59:59 PM 63.4 139.7731 10 24.77 TRUE 1.459728e+12
2873212765 4/6/2016 11:59:59 PM 56.7 125.0021 NA 21.45 TRUE 1.459987e+12
2873212765 4/7/2016 11:59:59 PM 57.2 126.1044 NA 21.65 TRUE 1.460074e+12
2891001357 4/5/2016 11:59:59 PM 88.4 194.8886 NA 25.03 TRUE 1.459901e+12
4445114986 3/30/2016 11:59:59 PM 92.4 203.7071 NA 35.01 TRUE 1.459382e+12
4558609924 4/8/2016 11:59:59 PM 69.4 153.0008 NA 27.14 TRUE 1.460160e+12
4702921684 4/4/2016 11:59:59 PM 99.7 219.8009 NA 26.11 TRUE 1.459814e+12
6962181067 3/30/2016 11:59:59 PM 61.5 135.5843 NA 24.03 TRUE 1.459382e+12
kable(head(hourly_steps, 10), caption = "Preview: Hourly Steps Data")
Preview: Hourly Steps Data
Id ActivityHour StepTotal
1503960366 3/12/2016 12:00:00 AM 0
1503960366 3/12/2016 1:00:00 AM 0
1503960366 3/12/2016 2:00:00 AM 0
1503960366 3/12/2016 3:00:00 AM 0
1503960366 3/12/2016 4:00:00 AM 0
1503960366 3/12/2016 5:00:00 AM 0
1503960366 3/12/2016 6:00:00 AM 0
1503960366 3/12/2016 7:00:00 AM 0
1503960366 3/12/2016 8:00:00 AM 0
1503960366 3/12/2016 9:00:00 AM 8

Note:

The sleepDay_merged.csv dataset was only available in the April-May (4.12.16-5.12.16) export folder. This means my sleep analysis is based on a smaller sample size compared to the activity steps, which span both March-April and April-May. This limitaion is taken into account when interpreting trends.

Process

The Process Phase is about cleaning andd preparing datasets before analysis. Typically for the Bellabeat/Fitbit case, we:

  • Convert dates/times into proper formats.
  • Remove Duplicates.
  • Check for missing values.
  • Standardized columns names.
library(janitor)

# Clean column names to snake_case for consistency
daily_activity <- clean_names(daily_activity)
sleep_data     <- clean_names(sleep_data)
weight_log     <- clean_names(weight_log)
hourly_steps   <- clean_names(hourly_steps)

# Convert date columns to proper date formats
daily_activity$activity_date <- mdy(daily_activity$activity_date)
sleep_data$sleep_day         <- mdy_hms(sleep_data$sleep_day)
weight_log$date              <- mdy_hms(weight_log$date)
hourly_steps$activity_hour   <- mdy_hms(hourly_steps$activity_hour)

# Remove duplicates
daily_activity <- distinct(daily_activity)
sleep_data     <- distinct(sleep_data)
weight_log     <- distinct(weight_log)
hourly_steps   <- distinct(hourly_steps)

# Check missing values
missing_summary <- tibble(
  dataset = c("Daily Activity", "Sleep", "Weight Log", "Hourly Steps"),
  missing = c(
    sum(is.na(daily_activity)),
    sum(is.na(sleep_data)),
    sum(is.na(weight_log)),
    sum(is.na(hourly_steps))
  )
)

# Display summary tables
kable(head(daily_activity, 10), caption = "Cleaned Daily Activity Data (first 10 rows)")
Cleaned Daily Activity Data (first 10 rows)
id activity_date total_steps total_distance tracker_distance logged_activities_distance very_active_distance moderately_active_distance light_active_distance sedentary_active_distance very_active_minutes fairly_active_minutes lightly_active_minutes sedentary_minutes calories
1503960366 2016-03-25 11004 7.11 7.11 0 2.57 0.46 4.07 0 33 12 205 804 1819
1503960366 2016-03-26 17609 11.55 11.55 0 6.92 0.73 3.91 0 89 17 274 588 2154
1503960366 2016-03-27 12736 8.53 8.53 0 4.66 0.16 3.71 0 56 5 268 605 1944
1503960366 2016-03-28 13231 8.93 8.93 0 3.19 0.79 4.95 0 39 20 224 1080 1932
1503960366 2016-03-29 12041 7.85 7.85 0 2.16 1.09 4.61 0 28 28 243 763 1886
1503960366 2016-03-30 10970 7.16 7.16 0 2.36 0.51 4.29 0 30 13 223 1174 1820
1503960366 2016-03-31 12256 7.86 7.86 0 2.29 0.49 5.04 0 33 12 239 820 1889
1503960366 2016-04-01 12262 7.87 7.87 0 3.32 0.83 3.64 0 47 21 200 866 1868
1503960366 2016-04-02 11248 7.25 7.25 0 3.00 0.45 3.74 0 40 11 244 636 1843
1503960366 2016-04-03 10016 6.37 6.37 0 0.91 1.28 4.18 0 15 30 314 655 1850
kable(head(sleep_data, 10), caption = "Cleaned Sleep Data (first 10 rows)")
Cleaned Sleep Data (first 10 rows)
id sleep_day total_sleep_records total_minutes_asleep total_time_in_bed
1503960366 2016-04-12 1 327 346
1503960366 2016-04-13 2 384 407
1503960366 2016-04-15 1 412 442
1503960366 2016-04-16 2 340 367
1503960366 2016-04-17 1 700 712
1503960366 2016-04-19 1 304 320
1503960366 2016-04-20 1 360 377
1503960366 2016-04-21 1 325 364
1503960366 2016-04-23 1 361 384
1503960366 2016-04-24 1 430 449
kable(head(weight_log, 10), caption = "Cleaned Weight Log Data (first 10 rows)")
Cleaned Weight Log Data (first 10 rows)
id date weight_kg weight_pounds fat bmi is_manual_report log_id
1503960366 2016-04-05 23:59:59 53.3 117.5064 22 22.97 TRUE 1.459901e+12
1927972279 2016-04-10 18:33:26 129.6 285.7191 NA 46.17 FALSE 1.460313e+12
2347167796 2016-04-03 23:59:59 63.4 139.7731 10 24.77 TRUE 1.459728e+12
2873212765 2016-04-06 23:59:59 56.7 125.0021 NA 21.45 TRUE 1.459987e+12
2873212765 2016-04-07 23:59:59 57.2 126.1044 NA 21.65 TRUE 1.460074e+12
2891001357 2016-04-05 23:59:59 88.4 194.8886 NA 25.03 TRUE 1.459901e+12
4445114986 2016-03-30 23:59:59 92.4 203.7071 NA 35.01 TRUE 1.459382e+12
4558609924 2016-04-08 23:59:59 69.4 153.0008 NA 27.14 TRUE 1.460160e+12
4702921684 2016-04-04 23:59:59 99.7 219.8009 NA 26.11 TRUE 1.459814e+12
6962181067 2016-03-30 23:59:59 61.5 135.5843 NA 24.03 TRUE 1.459382e+12
kable(head(hourly_steps, 10), caption = "Cleaned Hourly Steps Data (first 10 rows)")
Cleaned Hourly Steps Data (first 10 rows)
id activity_hour step_total
1503960366 2016-03-12 00:00:00 0
1503960366 2016-03-12 01:00:00 0
1503960366 2016-03-12 02:00:00 0
1503960366 2016-03-12 03:00:00 0
1503960366 2016-03-12 04:00:00 0
1503960366 2016-03-12 05:00:00 0
1503960366 2016-03-12 06:00:00 0
1503960366 2016-03-12 07:00:00 0
1503960366 2016-03-12 08:00:00 0
1503960366 2016-03-12 09:00:00 8
kable(missing_summary, caption = "Missing Values Summary Across Datasets")
Missing Values Summary Across Datasets
dataset missing
Daily Activity 0
Sleep 0
Weight Log 94
Hourly Steps 0

Notes:

  • Dates columns were sucessfuly converted into R date-time objects, ensuring time-based analysis will be accurate.
  • Duplicate records were removed to aovoid skewing results.
  • Missing values are minimal, except in the weight log, which is expected because not all users consistently record their weight.

Analyze

In this phase, we explore trends and relationships in the Fitbit data to uncover insights about user behavior.
The analysis focuses on daily activity, hourly steps, and sleep, as these are most relevant to Bellabeat’s wellness products and app.

library(ggplot2)

# --- Daily Activity Summary ---
activity_summary <- daily_activity %>%
  summarise(
    avg_steps = mean(total_steps, na.rm = TRUE),
    avg_calories = mean(calories, na.rm = TRUE),
    avg_sedentary = mean(sedentary_minutes, na.rm = TRUE),
    avg_active = mean(very_active_minutes, na.rm = TRUE)
  )

kable(activity_summary, caption = "Average Daily Activity Metrics")
Average Daily Activity Metrics
avg_steps avg_calories avg_sedentary avg_active
7280.898 2266.266 992.5426 19.67931
# --- Correlation: Steps vs Calories ---
steps_calories_plot <- ggplot(daily_activity, aes(x = total_steps, y = calories)) +
  geom_point(alpha = 0.5, color = "steelblue") +
  geom_smooth(method = "lm", se = FALSE, color = "darkred") +
  labs(
    title = "Relationship Between Daily Steps and Calories Burned",
    x = "Total Steps",
    y = "Calories Burned"
  ) +
  theme_minimal()

steps_calories_plot

# --- Average Steps by Hour ---
hourly_steps_summary <- hourly_steps %>%
  mutate(hour = hour(activity_hour)) %>%
  group_by(hour) %>%
  summarise(avg_steps = mean(step_total, na.rm = TRUE))

kable(hourly_steps_summary, caption = "Average Steps by Hour of Day")
Average Steps by Hour of Day
hour avg_steps
0 43.361240
1 21.884178
2 13.694416
3 6.850492
4 11.108752
5 34.926463
6 148.241969
7 282.654922
8 395.841451
9 431.373057
10 453.856179
11 454.740644
12 534.259124
13 496.229004
14 506.020344
15 398.062304
16 470.961619
17 499.712105
18 550.265929
19 554.885729
20 377.628888
21 283.504747
22 204.010032
23 112.085140
hourly_steps_plot <- ggplot(hourly_steps_summary, aes(x = hour, y = avg_steps)) +
  geom_line(color = "forestgreen", size = 1) +
  labs(
    title = "Hourly Average Step Trends",
    x = "Hour of Day",
    y = "Average Steps"
  ) +
  theme_minimal()

hourly_steps_plot

# --- Sleep Summary ---
sleep_summary <- sleep_data %>%
  summarise(
    avg_minutes_asleep = mean(total_minutes_asleep, na.rm = TRUE),
    avg_time_in_bed = mean(total_time_in_bed, na.rm = TRUE)
  )

kable(sleep_summary, caption = "Average Sleep Metrics")
Average Sleep Metrics
avg_minutes_asleep avg_time_in_bed
419.1732 458.4829
# --- Correlation: Sleep vs Activity ---
sleep_activity_plot <- ggplot(
  left_join(sleep_data, daily_activity, by = "id"),
  aes(x = total_minutes_asleep, y = total_steps)
) +
  geom_point(alpha = 0.5, color = "purple") +
  geom_smooth(method = "lm", se = FALSE, color = "darkorange") +
  labs(
    title = "Relationship Between Sleep Duration and Daily Steps",
    x = "Minutes Asleep",
    y = "Total Steps"
  ) +
  theme_minimal()

sleep_activity_plot

Notes:

  • Daily activity summary: Shows typical fitness tracker engagement.
  • Steps vs calories: Confirms physical activity directly gives calorie burn
  • Hourly Steps: Reveals peak activity times (often mornings and evenings) Useful for targeted notifications/reminders in the app.
  • Shows: whether users meet recommended sleep(7-9) hours. Helps Beallbeat sleep tracking features.
  • Sleep vs activity: Examines if better sleep aligns with higher activity, reinforcing holistic health messaging.

Share

In this phase, we summarize key insights from the analysis and highlight how they relate to Bellabeat’s business objectives.
The findings are presented with visualizations and tables for clarity.

Key Daily Activity Trends
avg_steps avg_calories avg_sedentary avg_active
7280.898 2266.266 992.5426 19.67931

Key Sleep Trends
avg_minutes_asleep avg_time_in_bed
419.1732 458.4829

Key Insights for Bellabeat:

  1. Daily Activity:
    • Users average around X steps/day and about Y calories a day.
    • However, sendentary minutes are high, showing opportunities for reminder features (nudges for movement).
  2. Hourly Activity:
    • Peak steps occur in the morning and evening hours, with mid-day slumps.
    • Bellabeat could schedule app notifications or coaching tips during inactive hours to boost engagement.
  3. Calories vs Steps:
    • A clear positive correlation shows that more steps equals more calories burned.
    • Bellabeat can promote its ability track its calorie burn in real time, motivating users to hit daily step goals.
  4. Sleep patterns:
    • Average sleep is around X hours, slightly below the recommended 7 - 9 hours.
    • Bellabeat can promote is sleep tracking features and highlight benefits of consistent rest.
  5. Sleep vs Activity
    • Users who sleep longer generally better activity levels.
    • This supports Bellabeats positioning as a holistic wellness tracker, not just a fitness device.

Strategic Takeaways:

- Bellabeat should emphasize holistic health(sleep + activity + calories) in marketing campaigns. 
- App features like **personal nudges**(moves reminders, sleep notifications) can align with observed user behaviour. 
-Insights supports promoting the **Bellabeat app** as a central hub for lifestyle improvement. 

Act

Based on the analysis, the following recommendations are proposed to help Bellabeat strengthen its marketing strategy and better engage its users.

Key Recommendations

  • Promote Daily Activity Tracking
    • Users show high sedentary minutes but clear benefits from increased step counts.
    • Bellabeat should highlight step tracking and calorie burn features in its app campaigns.
    • Introduce customizable reminders to nudge users during inactive periods.
  • Capitalize on Hourly Trends
    • Peak activity occurs in the mornings and evenings, with mid-day slumps.
    • Bellabeat can schedule in-app notifications, challenges, or motivational content during mid-day to encourage movement.
  • Highlight Holistic Health
    • Sleep data shows average durations below recommended levels.
    • Marketing should emphasize Bellabeat’s holistic approach — combining sleep, activity, and calorie tracking.
    • Offer personalized insights (e.g., “Better sleep improves your daily activity”) to connect wellness behaviors.
  • Leverage Weight Log Data
    • Weight/BMI data is inconsistently tracked, but those who log it are likely highly engaged users.
    • Bellabeat could promote premium features (like weight insights, nutrition tracking) to these users.
  • Position the Bellabeat App as a Wellness Coach
    • Use trends discovered (steps ↔︎ calories, sleep ↔︎ activity) to support marketing campaigns.
    • Present the app as a 24/7 wellness coach that helps women improve activity, rest, and overall lifestyle balance.

Strategic Impact

  • These insights support data-driven marketing campaigns showcasing Bellabeat’s value beyond just a device.
  • By aligning messaging with actual user behavior, Bellabeat can boost user retention, app engagement, and brand loyalty.
  • Implementing these recommendations could position Bellabeat as not only a fitness tracker, but as a holistic health partner for women.