will produce a repo with the following deliverables: [DATE :- 18-01-2026]
“ Analyze smart-device usage patterns from available fitness tracker data to identify trends and insights that can guide Bellabeat’s product development and marketing strategy for non-smart wellness products. “
fitbase_1 <- read.csv("~/coursera case study project/case stdy 2 _f/mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")
fitbase_2 <- read.csv("~/coursera case study project/case stdy 2 _f/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
colnames(fitbase_1)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
fitbase_1 <- fitbase_1 %>%
mutate(ActivityDate = as.Date(ActivityDate, format = "%m/%d/%Y"))
fitbase_2 <- fitbase_2 %>%
mutate(SleepDay = as.Date(SleepDay, format = "%m/%d/%Y %I:%M:%S %p")) %>%
rename(ActivityDate = SleepDay)
fitbase <- left_join(fitbase_1, fitbase_2, by = c("Id", "ActivityDate"))
glimpse(fitbase)
## Rows: 457
## Columns: 18
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <date> 2016-03-25, 2016-03-26, 2016-03-27, 2016-03-…
## $ TotalSteps <int> 11004, 17609, 12736, 13231, 12041, 10970, 122…
## $ TotalDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ TrackerDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.3…
## $ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.8…
## $ LightActiveDistance <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.6…
## $ SedentaryActiveDistance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.0…
## $ VeryActiveMinutes <int> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 3…
## $ FairlyActiveMinutes <int> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18…
## $ LightlyActiveMinutes <int> 205, 274, 268, 224, 243, 223, 239, 200, 244, …
## $ SedentaryMinutes <int> 804, 588, 605, 1080, 763, 1174, 820, 866, 636…
## $ Calories <int> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 186…
## $ TotalSleepRecords <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalMinutesAsleep <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalTimeInBed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Glimps in r programming 👇
glimpse(fitbase)
## Rows: 457
## Columns: 18
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <date> 2016-03-25, 2016-03-26, 2016-03-27, 2016-03-…
## $ TotalSteps <int> 11004, 17609, 12736, 13231, 12041, 10970, 122…
## $ TotalDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ TrackerDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.3…
## $ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.8…
## $ LightActiveDistance <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.6…
## $ SedentaryActiveDistance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.0…
## $ VeryActiveMinutes <int> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 3…
## $ FairlyActiveMinutes <int> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18…
## $ LightlyActiveMinutes <int> 205, 274, 268, 224, 243, 223, 239, 200, 244, …
## $ SedentaryMinutes <int> 804, 588, 605, 1080, 763, 1174, 820, 866, 636…
## $ Calories <int> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 186…
## $ TotalSleepRecords <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalMinutesAsleep <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalTimeInBed <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Date : 19-1-26â–Ľ Tools Used 1. My SQL 2. R
(Programming) 3. Tableau 4. Excel
â–ĽDividing process into two steps :-
🔹 Part 1: Activity-based behavior (ALL USERS)
• Steps
• Calories
• Active minutes
• Sedentary time 🔹 Part 2: Sleep-based behavior (ONLY USERS WITH SLEEP DATA)
• Filter users/days where sleep is available
• Explicitly mention this limitation
19-01-2026
sorted the tables
library(dplyr)
dup_check <- fitbase %>% count(Id,ActivityDate) %>% filter(n >1)
dup_check
## [1] Id ActivityDate n
## <0 rows> (or 0-length row.names)
str(fitbase)
## 'data.frame': 457 obs. of 18 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : Date, format: "2016-03-25" "2016-03-26" ...
## $ TotalSteps : int 11004 17609 12736 13231 12041 10970 12256 12262 11248 10016 ...
## $ TotalDistance : num 7.11 11.55 8.53 8.93 7.85 ...
## $ TrackerDistance : num 7.11 11.55 8.53 8.93 7.85 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 2.57 6.92 4.66 3.19 2.16 ...
## $ ModeratelyActiveDistance: num 0.46 0.73 0.16 0.79 1.09 ...
## $ LightActiveDistance : num 4.07 3.91 3.71 4.95 4.61 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 33 89 56 39 28 30 33 47 40 15 ...
## $ FairlyActiveMinutes : int 12 17 5 20 28 13 12 21 11 30 ...
## $ LightlyActiveMinutes : int 205 274 268 224 243 223 239 200 244 314 ...
## $ SedentaryMinutes : int 804 588 605 1080 763 1174 820 866 636 655 ...
## $ Calories : int 1819 2154 1944 1932 1886 1820 1889 1868 1843 1850 ...
## $ TotalSleepRecords : int NA NA NA NA NA NA NA NA NA NA ...
## $ TotalMinutesAsleep : int NA NA NA NA NA NA NA NA NA NA ...
## $ TotalTimeInBed : int NA NA NA NA NA NA NA NA NA NA ...
21-01-2026 [ Analyzis using R ]
library(dplyr)
user_summary <- fitbase %>% group_by(Id) %>%
summarise(avg_steps = mean(TotalSteps ,na.rm = TRUE),
avg_calories = mean(Calories , na.rm = TRUE),
avg_sedimentary_minutes = mean(SedentaryMinutes , na.rm = TRUE),
avg_active_minutes = mean(VeryActiveMinutes +FairlyActiveMinutes +LightlyActiveMinutes))
str(user_summary)
## tibble [35 Ă— 5] (S3: tbl_df/tbl/data.frame)
## $ Id : num [1:35] 1.50e+09 1.62e+09 1.64e+09 1.84e+09 1.93e+09 ...
## $ avg_steps : num [1:35] 11641 4226 9275 3641 2181 ...
## $ avg_calories : num [1:35] 1796 1353 2916 1616 2254 ...
## $ avg_sedimentary_minutes: num [1:35] 810 1278 1034 1035 953 ...
## $ avg_active_minutes : num [1:35] 280 122 286 160 113 ...
View(user_summary)
library(ggplot2)
A. Average steps per user
ggplot(user_summary,
aes(x = factor(Id), y = avg_steps )) +
geom_col() +
labs(
title = "Average Daily Steps per User",
x = "User ID",
y = "Average Steps"
) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
B. Sedimentary time vs steps
ggplot(user_summary, aes(x = avg_steps, y = avg_sedimentary_minutes)) +
geom_point() +
labs(
title = "Sedimentary Time vs Average Steps",
x = "Average Steps",
y = "Average Sedentary Minutes"
)
C. Calories Vs Activity
ggplot(user_summary, aes(x = avg_active_minutes, y = avg_calories)) +
geom_point() +
labs(
title = "Active Minutes vs Calories Burned",
x = "Average Active Minutes",
y = "Average Calories"
)
âś… Recommendation 1: Habit-building over performance Since most users show moderate activity with high sedentary time, Bellabeat should emphasize daily habit formation rather than intense fitness goals.
âś… Recommendation 2: Non-smart product positioning Non-smart wellness products (journals, hydration reminders, mindfulness tools) should target sedentary users, positioning wellness as achievable without technology overload.
✅ Recommendation 3: Marketing messaging Campaigns should focus on “small daily progress”, aligning with users who are not highly active but consistently engaged.
âś… Recommendation 4: Sleep features (future opportunity) Given the limited sleep data, Bellabeat can differentiate by encouraging manual sleep reflection in non-smart products to complement wearable insights.