Introduction and Background

Bellabeat, a company specializing in women’s wellness technology, aims to leverage data analytics to gain insights into user behavior and optimize its products. This case study explores user activity, sleep patterns, and engagement levels using Fitbit data to provide actionable recommendations that can help Bellabeat enhance its product offerings.

1. Uploading and Preparing Data

The dataset used in this study is sourced from Kaggle’s Fitbit dataset, which contains multiple CSV files with user activity, sleep, and health metrics.

# Load necessary libraries
install.packages("tidyverse", dependencies=TRUE)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.2     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.2     âś” tibble    3.2.1
## âś” lubridate 1.9.4     âś” tidyr     1.3.0
## âś” purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load datasets
daily_activity <- read.csv("dailyActivity_merged.csv")
sleep_data <- read.csv("sleepDay_merged.csv")

2. Exploring the Data

Before diving into analysis, let’s explore the structure and summary statistics of our datasets.

# View the first few rows
head(daily_activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
head(sleep_data)
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320
# Column names
colnames(daily_activity)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"
colnames(sleep_data)
## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

Unique Participants and Observations

n_distinct(daily_activity$Id) # Unique users in activity data
## [1] 33
n_distinct(sleep_data$Id)     # Unique users in sleep data
## [1] 24
nrow(daily_activity) # Total observations in daily activity
## [1] 940
nrow(sleep_data)     # Total observations in sleep data
## [1] 413

3. Data Cleaning and Processing

To ensure data integrity, we perform the following steps: - Remove duplicates - Convert date columns to proper format - Handle missing values - Filter out inconsistencies

# Convert date format
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format="%m/%d/%Y")
sleep_data$SleepDay <- as.Date(sleep_data$SleepDay, format="%m/%d/%Y")

# Check for missing values
colSums(is.na(daily_activity))
##                       Id             ActivityDate               TotalSteps 
##                        0                        0                        0 
##            TotalDistance          TrackerDistance LoggedActivitiesDistance 
##                        0                        0                        0 
##       VeryActiveDistance ModeratelyActiveDistance      LightActiveDistance 
##                        0                        0                        0 
##  SedentaryActiveDistance        VeryActiveMinutes      FairlyActiveMinutes 
##                        0                        0                        0 
##     LightlyActiveMinutes         SedentaryMinutes                 Calories 
##                        0                        0                        0
colSums(is.na(sleep_data))
##                 Id           SleepDay  TotalSleepRecords TotalMinutesAsleep 
##                  0                  0                  0                  0 
##     TotalTimeInBed 
##                  0
# Remove duplicates
daily_activity <- distinct(daily_activity)
sleep_data <- distinct(sleep_data)

4. Data Analysis & Insights

Summary Statistics

summary(daily_activity %>% select(TotalSteps, TotalDistance, SedentaryMinutes))
##    TotalSteps    TotalDistance    SedentaryMinutes
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8  
##  Median : 7406   Median : 5.245   Median :1057.5  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0
summary(sleep_data %>% select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed))
##  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.00      Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1.00      1st Qu.:361.0      1st Qu.:403.8  
##  Median :1.00      Median :432.5      Median :463.0  
##  Mean   :1.12      Mean   :419.2      Mean   :458.5  
##  3rd Qu.:1.00      3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :3.00      Max.   :796.0      Max.   :961.0

5. Data Visualization

Activity vs. Sedentary Time

ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) +
  geom_point(color='blue') +
  labs(title="Steps vs. Sedentary Minutes",
       x="Total Steps", y="Sedentary Minutes")

Sleep Duration vs. Time in Bed

ggplot(data=sleep_data, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
  geom_point(color='green') +
  labs(title="Sleep Duration vs. Time in Bed",
       x="Minutes Asleep", y="Total Time in Bed")

6. Merging Datasets for Deeper Insights

To analyze relationships between sleep and activity, we merge datasets using the Id field.

combined_data <- merge(sleep_data, daily_activity, by="Id")
n_distinct(combined_data$Id)
## [1] 24

Correlation Between Sleep and Activity

ggplot(combined_data, aes(x=TotalMinutesAsleep, y=TotalSteps)) +
  geom_point(color='purple') +
  labs(title="Relationship Between Sleep and Activity",
       x="Total Minutes Asleep", y="Total Steps")

7. Key Findings & Recommendations

Key Insights:

  • Users with higher sedentary minutes tend to have lower step counts, indicating potential for intervention strategies.
  • There is a moderate correlation between sleep and activity, suggesting that well-rested individuals may have higher activity levels.
  • Marketing campaigns can target users with lower step counts to promote Bellabeat’s features that encourage movement.

Recommendations for Bellabeat:

  1. Personalized Activity Reminders: Notify users with low step counts to stay active.
  2. Sleep Optimization Features: Provide insights on ideal sleep durations for better activity performance.
  3. Gamification & Rewards: Encourage step goals with reward-based challenges.

8. Conclusion

By leveraging Fitbit data, Bellabeat can enhance its product strategy to increase user engagement and promote wellness. These insights enable Bellabeat to refine marketing strategies and develop features tailored to users’ needs.


Next Steps: Further analysis can explore heart rate data, demographic segmentation, and activity trends over time for deeper personalization.