1 INTRODUCTION

Bellabeat is a high-tech manufacturer of health-focused products designed specifically for women. Although Bellabeat is currently a successful small company, it has significant potential to grow into a major player in the global smart device market. This analysis explores smart device usage patterns to uncover data-driven insights that can inform Bellabeat’s marketing strategy and support future growth.

2 Business Task

To gain insight into how consumers use their smart device, identify new growth opportunities for the company and make recommendations on the Bellabeat marketing strategy improvement based on trends.

3 Load packages

4 Dataset Import

For this analysis we are using Fitbit Fitness Tracker <[Data] (https://www.kaggle.com/datasets/arashnic/fitbit/data)>

activity  <- readr::read_csv("data/dailyActivity_merged.csv", show_col_types = FALSE)
intensity <- readr::read_csv("data/hourlyIntensities_merged.csv", show_col_types = FALSE)
calories  <- readr::read_csv("data/hourlyCalories_merged.csv", show_col_types = FALSE)
sleep     <- readr::read_csv("data/sleepDay_merged.csv", show_col_types = FALSE)
weight    <- readr::read_csv("data/weightLogInfo_merged.csv", show_col_types = FALSE)

I verified the imported data by inspecting the first few rows and structure using head() and glimpse() functions.

knitr::kable(head(activity, 6))
Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
1503960366 4/12/2016 13162 8.50 8.50 0 1.88 0.55 6.06 0 25 13 328 728 1985
1503960366 4/13/2016 10735 6.97 6.97 0 1.57 0.69 4.71 0 21 19 217 776 1797
1503960366 4/14/2016 10460 6.74 6.74 0 2.44 0.40 3.91 0 30 11 181 1218 1776
1503960366 4/15/2016 9762 6.28 6.28 0 2.14 1.26 2.83 0 29 34 209 726 1745
1503960366 4/16/2016 12669 8.16 8.16 0 2.71 0.41 5.04 0 36 10 221 773 1863
1503960366 4/17/2016 9705 6.48 6.48 0 3.19 0.78 2.51 0 38 20 164 539 1728

I discovered that the date and time stamp was formatted as a character. So before analysis, I converted it to date-time format and split it to date and time.

5 Fixing Formatting

Date and time variables were standardized using lubridate to enable consistent daily and hourly analysis across datasets.

#activity
activity <- activity %>%
  mutate(date = mdy(ActivityDate))
#intensity
intensity <- intensity %>%
  mutate(
    ActivityHour = mdy_hms(ActivityHour),
    date = as.Date(ActivityHour),
    time = format(ActivityHour, "%H:%M:%S")
  )
#calories
calories <- calories %>%
  mutate(
    ActivityHour = mdy_hms(ActivityHour),
    date = as.Date(ActivityHour),
    time = format(ActivityHour, "%H:%M:%S")
  )
#sleep
sleep <- sleep %>%
  mutate(
    SleepDay = mdy_hms(SleepDay),
    date = as.Date(SleepDay)
  )
#weight
weight <- weight %>%
  mutate(
    Date = mdy_hms(Date),
    date = as.Date(Date)
  )

Having confirmed that the datasets were properly formatted, exploratory analysis was conducted.

6 Exploratory Data Analysis

Initial exploratory analysis was conducted to understand user participation, activity patterns, and data completeness across the datasets.

tibble(
  dataset = c("Activity", "Calories", "Intensity", "Sleep", "Weight"),
  participants = c(
    n_distinct(activity$Id),
    n_distinct(calories$Id),
    n_distinct(intensity$Id),
    n_distinct(sleep$Id),
    n_distinct(weight$Id)
  )
) %>%
  knitr::kable()
dataset participants
Activity 33
Calories 33
Intensity 33
Sleep 24
Weight 8

The activity, calories, and intensity datasets each contain records from 33 unique users, indicating consistent coverage across daily and hourly activity tracking. The sleep dataset includes 24 users, while the weight dataset contains data from only 8 users. Due to the limited sample size, the weight dataset was excluded from further analysis to avoid unreliable conclusions.

Let’s have a look at summary statistics of the data sets:

# activity 
activity %>% 
select(TotalSteps, TotalDistance, SedentaryMinutes, Calories) %>% 
summary() 
##    TotalSteps    TotalDistance    SedentaryMinutes    Calories   
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0   Min.   :   0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8   1st Qu.:1828  
##  Median : 7406   Median : 5.245   Median :1057.5   Median :2134  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2   Mean   :2304  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5   3rd Qu.:2793  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0   Max.   :4900
# explore num of active minutes per category 
activity %>% 
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>% 
summary() 
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0       
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0       
##  Median :  4.00    Median :  6.00      Median :199.0       
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8       
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0       
##  Max.   :210.00    Max.   :143.00      Max.   :518.0
# calories 
summary(calories$Calories)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   42.00   63.00   83.00   97.39  108.00  948.00
# sleep 
sleep %>% 
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>% 
summary() 
##  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.000     Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1.000     1st Qu.:361.0      1st Qu.:403.0  
##  Median :1.000     Median :433.0      Median :463.0  
##  Mean   :1.119     Mean   :419.5      Mean   :458.6  
##  3rd Qu.:1.000     3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :3.000     Max.   :796.0      Max.   :961.0
# weight 
weight %>% 
select(WeightKg, BMI) %>% 
summary()
##     WeightKg           BMI       
##  Min.   : 52.60   Min.   :21.45  
##  1st Qu.: 61.40   1st Qu.:23.96  
##  Median : 62.50   Median :24.39  
##  Mean   : 72.04   Mean   :25.19  
##  3rd Qu.: 85.05   3rd Qu.:25.56  
##  Max.   :133.50   Max.   :47.54

Summary statistics revealed wide variation in daily steps, calories burned, and sedentary time, suggesting differing activity levels among users.

Average sedentary time is 991 minutes (about 16 hours), meaning a large proportion of daily time was spent in sedentary or lightly active states, while very active minutes were comparatively low.

According to guidance aligned with CDC physical activity recommendations, health benefits for adults under 60 are commonly observed at approximately 8,000–10,000 steps per day. In this dataset, the average daily step count is 7,638 steps, which falls slightly below this range, indicating potential opportunities to encourage increased daily movement.

Hourly intensity data was aggregated to daily values to align with the daily activity dataset and enable meaningful comparisons across datasets.

daily_intensity <- intensity %>%
  group_by(Id, date) %>%
  summarise(
    mean_intensity = mean(TotalIntensity, na.rm = TRUE),
    max_intensity  = max(TotalIntensity, na.rm = TRUE),
    total_intensity = sum(TotalIntensity, na.rm = TRUE),
    .groups = "drop"
  )

7 Merging Dataset

I joined activity and daily intensity by Id and date

daily_activity <- activity %>%
  left_join(daily_intensity, by = c("Id", "date"))

I also joined activity dataset with sleep dataset using Id and date

activity_sleep <- activity %>%
  inner_join(sleep, by = c("Id", "date"))
n_distinct(activity_sleep$Id)
## [1] 24
daily_activity_clean <- daily_activity %>%
  filter(
    TotalSteps > 0,
    Calories > 0,
    SedentaryMinutes < 1440
  )

8 Visualization

# Distribution of Daily Steps
ggplot(daily_activity_clean, aes(TotalSteps)) +
  geom_histogram(bins = 30, fill = "steelblue", colour = "black") +
  labs(
    title = "Distribution of Daily Steps",
    x = "Total Steps",
    y = "Number of Days"
  ) +
   theme_minimal()

This chart shows a wide spread in daily step counts across users, indicating substantial variation in activity levels. While some users are highly active, many record relatively low step counts on most days.

ggplot(daily_activity_clean, aes(TotalSteps, Calories)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "Relationship Between Steps and Calories Burned",
    x = "Total Steps",
    y = "Calories"
  ) +
   theme_minimal()

A clear positive relationship exists between total daily steps and calories burned. Users who take more steps consistently expend more energy, reinforcing steps as an effective indicator of daily activity.

daily_activity_clean %>%
  select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
  pivot_longer(everything(), names_to = "ActivityType", values_to = "Minutes") %>%
  ggplot(aes(ActivityType, Minutes)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Active Minutes by Intensity Level",
    x = "Activity Type",
    y = "Minutes"
  ) +
  theme_minimal()

Lightly active minutes dominate daily activity, while time spent in higher-intensity activity is comparatively low. This suggests opportunities to encourage short bursts of moderate-to-vigorous activity.

intensity %>%
  mutate(hour = hour(ActivityHour)) %>%
  group_by(hour) %>%
  summarise(avg_intensity = mean(TotalIntensity, na.rm = TRUE)) %>%
  ggplot(aes(hour, avg_intensity)) +
  geom_line() +
  labs(
    title = "Average Activity Intensity by Hour of Day",
    x = "Hour of Day",
    y = "Average Intensity"
  )

Activity intensity follows a daily pattern, peaking during daytime hours and declining at night. This reflects typical daily routines and highlights optimal periods for activity-based engagement or reminders

9 Key Insights

  1. Daily activity levels vary widely among users

The distribution of daily steps shows that while some users are highly active, many record relatively low step counts on most days. This indicates uneven engagement with physical activity and potential opportunities to encourage more consistent movement.

  1. Higher activity is strongly associated with higher calorie expenditure

There is a clear positive relationship between total steps and calories burned. Users who take more steps consistently burn more calories, reinforcing the value of step-based activity as a simple and effective health metric.

  1. Most recorded activity time is low-intensity

Lightly active and sedentary minutes dominate daily activity patterns, while very active minutes are comparatively limited. This suggests that many users engage in movement but may struggle to reach higher-intensity activity levels.

  1. Activity intensity follows a daily time pattern

Average activity intensity peaks during daytime hours and declines significantly at night. This reflects typical daily routines and highlights opportunities for timely nudges during peak activity windows.

  1. Sleep data is available for a subset of users

Sleep duration varies considerably among users who track sleep. While this dataset is smaller, it provides useful context for understanding how activity and rest may interact for engaged users.

10 Actionable Recommendations for Bellabeat

  1. Promote step-based goals as a core engagement feature

Since steps strongly correlate with calorie burn, Bellabeat can emphasize personalized daily step goals that adapt to user activity history rather than fixed targets. This lowers the barrier for less active users while still motivating improvement.

  1. Encourage short bursts of higher-intensity activity

Given the dominance of light activity, Bellabeat can introduce micro-workout prompts (5–10 minutes) that encourage users to transition from light to moderate or vigorous activity, especially during peak daytime hours.

  1. Leverage time-of-day insights for smart notifications

Activity intensity patterns suggest optimal times for engagement. Bellabeat can send context-aware reminders during periods when users are most likely to be active, increasing the effectiveness of in-app nudges.

  1. Integrate sleep-aware activity recommendations

For users who track sleep, Bellabeat can tailor activity suggestions based on sleep duration, promoting recovery-aware fitness guidance and reinforcing Bellabeat’s holistic health positioning.

  1. Improve data completeness through user education

The limited use of weight and sleep tracking suggests an opportunity to educate users on the benefits of logging these metrics. Clear messaging on how these features enhance personalized insights could increase adoption.

11 Remark

These insights highlight opportunities for Bellabeat to increase user engagement by promoting achievable activity goals, timely interventions, and personalized health guidance aligned with real-world user behavior.

12 Closing Note

Thank you for your interest in my Bellabeat Case Study!

This project represents my first end-to-end case study using R for data cleaning, analysis, and visualization. I welcome feedback and suggestions for further improvement.