Bellabeat, a company specializing in women’s wellness technology, aims to leverage data analytics to gain insights into user behavior and optimize its products. This case study explores user activity, sleep patterns, and engagement levels using Fitbit data to provide actionable recommendations that can help Bellabeat enhance its product offerings.
The dataset used in this study is sourced from Kaggle’s Fitbit dataset, which contains multiple CSV files with user activity, sleep, and health metrics.
# Load necessary libraries
install.packages("tidyverse", dependencies=TRUE)
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.2 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.2 âś” tibble 3.2.1
## âś” lubridate 1.9.4 âś” tidyr 1.3.0
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load datasets
daily_activity <- read.csv("dailyActivity_merged.csv")
sleep_data <- read.csv("sleepDay_merged.csv")
Before diving into analysis, let’s explore the structure and summary statistics of our datasets.
# View the first few rows
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(sleep_data)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
# Column names
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(sleep_data)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
n_distinct(daily_activity$Id) # Unique users in activity data
## [1] 33
n_distinct(sleep_data$Id) # Unique users in sleep data
## [1] 24
nrow(daily_activity) # Total observations in daily activity
## [1] 940
nrow(sleep_data) # Total observations in sleep data
## [1] 413
To ensure data integrity, we perform the following steps: - Remove duplicates - Convert date columns to proper format - Handle missing values - Filter out inconsistencies
# Convert date format
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format="%m/%d/%Y")
sleep_data$SleepDay <- as.Date(sleep_data$SleepDay, format="%m/%d/%Y")
# Check for missing values
colSums(is.na(daily_activity))
## Id ActivityDate TotalSteps
## 0 0 0
## TotalDistance TrackerDistance LoggedActivitiesDistance
## 0 0 0
## VeryActiveDistance ModeratelyActiveDistance LightActiveDistance
## 0 0 0
## SedentaryActiveDistance VeryActiveMinutes FairlyActiveMinutes
## 0 0 0
## LightlyActiveMinutes SedentaryMinutes Calories
## 0 0 0
colSums(is.na(sleep_data))
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 0 0 0 0
## TotalTimeInBed
## 0
# Remove duplicates
daily_activity <- distinct(daily_activity)
sleep_data <- distinct(sleep_data)
summary(daily_activity %>% select(TotalSteps, TotalDistance, SedentaryMinutes))
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
summary(sleep_data %>% select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed))
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) +
geom_point(color='blue') +
labs(title="Steps vs. Sedentary Minutes",
x="Total Steps", y="Sedentary Minutes")
ggplot(data=sleep_data, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
geom_point(color='green') +
labs(title="Sleep Duration vs. Time in Bed",
x="Minutes Asleep", y="Total Time in Bed")
To analyze relationships between sleep and activity,
we merge datasets using the Id field.
combined_data <- merge(sleep_data, daily_activity, by="Id")
n_distinct(combined_data$Id)
## [1] 24
ggplot(combined_data, aes(x=TotalMinutesAsleep, y=TotalSteps)) +
geom_point(color='purple') +
labs(title="Relationship Between Sleep and Activity",
x="Total Minutes Asleep", y="Total Steps")
By leveraging Fitbit data, Bellabeat can enhance its product strategy to increase user engagement and promote wellness. These insights enable Bellabeat to refine marketing strategies and develop features tailored to users’ needs.
Next Steps: Further analysis can explore heart rate data, demographic segmentation, and activity trends over time for deeper personalization.