OBJECTIVE: To analyze user activity patterns and engagement behavior in Bellabeat fitness tracker data.
KEY INSIGHTS
70% of users (23/33) showed zero engagement with activity logging, while only a small fraction demonstrated consistent usage, indicating a significant engagement gap within the user base.
Consistency in meeting activity thresholds strongly correlates with higher total movement (step count).
Higher consistency is associated with significantly greater time spent in moderate-to-high intensity activity, while light activity remains relatively similar across groups.
Consistently active users exhibit ~3 hours lower sedentary time compared to less active groups, indicating a meaningful behavioral difference.
library(tidyverse)
library(rmarkdown)
dailyActivity_March_April <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")
dailyActivity_April_May <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
Data from two months (March-April and April-May) are combined into one file Uncommon ID’s are removed
common_ids <- intersect(dailyActivity_March_April$Id,
dailyActivity_April_May$Id)
dailyActivity_March_April_IDCleaned <- dailyActivity_March_April %>%
filter(Id %in% common_ids)
setequal(dailyActivity_March_April_IDCleaned$Id, dailyActivity_April_May$Id) #True
dailyActivity_combined <- bind_rows(dailyActivity_March_April_IDCleaned, dailyActivity_April_May)
Date which is originally formatted as string is formatted to Date
dailyActivity_DateCleaned <- dailyActivity_combined %>%
mutate(ActivityDate_cleaned = mdy(dailyActivity_combined$ActivityDate))
max(dailyActivity_DateCleaned$ActivityDate_cleaned) #to check if dates are formatted correctly
min(dailyActivity_DateCleaned$ActivityDate_cleaned)
Users are segmented based on their activity level using Heart Points(HP)
“Heart Points” is a health metric primarily used by Google Fit (developed in collaboration with the American Heart Association and the WHO) to measure physical activity that benefits your heart. Rather than just counting steps, it focuses on intensity. Here is the breakdown:
The Goal: The standard recommendation is to hit 150 Heart Points per week, which aligns with global health guidelines for reducing the risk of heart disease and improving mental well-being.
STEP1 - people are grouped into 3 groups based on their activity level.
| No_of_weeks_with_150_HP | group |
|---|---|
| >6 | Consistently Active |
| 3-5 | Ocassionally Active |
| 0-2 | Inactive |
activity_status <- dailyActivity_DateCleaned %>%
mutate(week = isoweek(ActivityDate_cleaned)) %>%
group_by(Id,week) %>%
summarise(total_hp = sum(VeryActiveMinutes)*2 + sum(FairlyActiveMinutes)) %>%
arrange(Id,week) %>%
mutate(active_week = ifelse(total_hp > 150,1,0)) %>%
group_by(Id) %>%
summarise(active_weeks = sum(active_week),
avg_weekly_hp = mean(total_hp),
sd_hp = sd(total_hp)
) %>%
mutate(group = case_when(active_weeks >= 6 ~ 'Consistently Active',
between(active_weeks,3,5) ~ 'Ocassionally Active',
between(active_weeks,0,2) ~ 'Inactive',
)
) %>%
arrange(-active_weeks)
activity_status
STEP 2 - Group level Descriptive Statics is calculated
Group_summary <- dailyActivity_DateCleaned %>%
inner_join(activity_status) %>%
group_by(group) %>%
summarise(across(c(TotalSteps, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, Calories, SedentaryMinutes),
list(mean = ~mean(.x),
sd = ~sd(.x),
CV = ~sd(.x)/mean(.x)
)
)
) %>%
left_join(group_count, by = 'group') %>%
arrange(-TotalSteps_mean)
knitr::kable(Group_summary)
| group | TotalSteps_mean | TotalSteps_sd | TotalSteps_CV | VeryActiveMinutes_mean | VeryActiveMinutes_sd | VeryActiveMinutes_CV | FairlyActiveMinutes_mean | FairlyActiveMinutes_sd | FairlyActiveMinutes_CV | LightlyActiveMinutes_mean | LightlyActiveMinutes_sd | LightlyActiveMinutes_CV | Calories_mean | Calories_sd | Calories_CV | SedentaryMinutes_mean | SedentaryMinutes_sd | SedentaryMinutes_CV | No_of_people |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Consistently Active | 11037.886 | 5037.274 | 0.4563622 | 46.122622 | 38.89035 | 0.8431948 | 20.581395 | 21.11626 | 1.025988 | 202.1607 | 78.30357 | 0.3873333 | 2512.562 | 839.4916 | 0.3341177 | 861.277 | 261.3623 | 0.3034591 | 11 |
| Ocassionally Active | 6726.277 | 4137.683 | 0.6151521 | 9.580220 | 16.29173 | 1.7005586 | 15.562637 | 21.75069 | 1.397622 | 188.8879 | 120.79497 | 0.6395061 | 2384.066 | 710.4784 | 0.2980112 | 1037.976 | 306.0314 | 0.2948348 | 11 |
| Inactive | 4141.190 | 3701.141 | 0.8937384 | 2.811947 | 10.90912 | 3.8795612 | 2.747788 | 10.43948 | 3.799232 | 167.5686 | 127.96586 | 0.7636626 | 1899.836 | 518.5674 | 0.2729537 | 1076.909 | 319.4748 | 0.2966590 | 11 |
Out of 33 individuals only 11 are consistently active.
Group_summary %>%
ggplot(aes(x=reorder(group, -No_of_people), y=No_of_people)) + geom_col() + labs(x='group')
Consistently active individuals are moving around twice then occasionally active and thrice then inactive individuals.
Group_summary %>%
ggplot(aes(x=reorder(group, -TotalSteps_mean), y=TotalSteps_mean)) + geom_col() + labs(x='group')
Very active minutes are exponentially high in consistently active
individuals.
Moderate activity levels are exponentially low in Inactive
individuals
light activity minutes decreased with decreasing activity levels but
not much difference
sedentary minutes are same of occasionally active and inactive and
lower of consistently active
Average weekly heart points for consistently active individuals is
exponentially high (more then 4 times the recommended minimum activity
levels)
activity_status %>%
filter(group == 'Consistently Active') %>%
ggplot(aes(y=avg_weekly_hp)) + geom_boxplot()
For occasionally active users this median drops to 200 which is
more then 150 but they are not consistently reaching 150+ across weeks,
some weeks they are more active and some weeks less
Inactive users are not fulfilling the minimum activity guidelines
23/33 people didn’t even log distance once
engagement <- dailyActivity_DateCleaned %>%
group_by(Id) %>%
summarise(No_of_times_logged = sum(ifelse(LoggedActivitiesDistance>0,1,0))) %>%
group_by(No_of_times_logged) %>%
summarise(No_of_people = n())
knitr::kable(engagement)
| No_of_times_logged | No_of_people |
|---|---|
| 0 | 28 |
| 1 | 2 |
| 7 | 1 |
| 19 | 1 |
| 23 | 1 |