OBJECTIVE: To analyze user activity patterns and engagement behavior in Bellabeat fitness tracker data.

KEY INSIGHTS

DATA CLEANING

INSTALLING PAKAGES

library(tidyverse)
library(rmarkdown)

DATA IMPORT

dailyActivity_March_April <- read_csv("mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")

dailyActivity_April_May <- read_csv("mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")

COMBINE DATA

Data from two months (March-April and April-May) are combined into one file Uncommon ID’s are removed

common_ids <- intersect(dailyActivity_March_April$Id,
                 dailyActivity_April_May$Id)

dailyActivity_March_April_IDCleaned <- dailyActivity_March_April %>% 
  filter(Id %in% common_ids)

setequal(dailyActivity_March_April_IDCleaned$Id, dailyActivity_April_May$Id) #True

dailyActivity_combined <- bind_rows(dailyActivity_March_April_IDCleaned, dailyActivity_April_May)

CLEAN DATE

Date which is originally formatted as string is formatted to Date

dailyActivity_DateCleaned <- dailyActivity_combined %>% 
  mutate(ActivityDate_cleaned = mdy(dailyActivity_combined$ActivityDate))

max(dailyActivity_DateCleaned$ActivityDate_cleaned) #to check if dates are formatted correctly 
min(dailyActivity_DateCleaned$ActivityDate_cleaned)

DATA ANALYSIS

SEGMENTATION

Users are segmented based on their activity level using Heart Points(HP)

“Heart Points” is a health metric primarily used by Google Fit (developed in collaboration with the American Heart Association and the WHO) to measure physical activity that benefits your heart. Rather than just counting steps, it focuses on intensity. Here is the breakdown:

  • Moderate Activity: You earn 1 Heart Point for every minute of brisk walking or similar activity where your heart rate increases.
  • Intense Activity: You earn 2 Heart Points for every minute of higher-intensity activities like running, cycling, or rowing.

The Goal: The standard recommendation is to hit 150 Heart Points per week, which aligns with global health guidelines for reducing the risk of heart disease and improving mental well-being.

STEP1 - people are grouped into 3 groups based on their activity level.

No_of_weeks_with_150_HP group
>6 Consistently Active
3-5 Ocassionally Active
0-2 Inactive
activity_status <- dailyActivity_DateCleaned %>%
  mutate(week = isoweek(ActivityDate_cleaned)) %>% 
  group_by(Id,week) %>% 
  summarise(total_hp = sum(VeryActiveMinutes)*2 + sum(FairlyActiveMinutes)) %>% 
  arrange(Id,week) %>% 
  mutate(active_week = ifelse(total_hp > 150,1,0)) %>%
  group_by(Id) %>% 
  summarise(active_weeks = sum(active_week), 
            avg_weekly_hp = mean(total_hp), 
            sd_hp = sd(total_hp)
            ) %>%
  mutate(group = case_when(active_weeks >= 6 ~ 'Consistently Active',
                           between(active_weeks,3,5) ~ 'Ocassionally Active',
                           between(active_weeks,0,2) ~ 'Inactive',
                          
                           )
         ) %>% 
   arrange(-active_weeks)
activity_status 

STEP 2 - Group level Descriptive Statics is calculated

Group_summary <- dailyActivity_DateCleaned %>% 
  inner_join(activity_status) %>% 
  group_by(group) %>% 
  summarise(across(c(TotalSteps, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, Calories, SedentaryMinutes),
                   list(mean = ~mean(.x),
                        sd = ~sd(.x), 
                        CV = ~sd(.x)/mean(.x)
                        )
                   )
            ) %>% 
  left_join(group_count, by = 'group') %>%  
  arrange(-TotalSteps_mean)
knitr::kable(Group_summary)
group TotalSteps_mean TotalSteps_sd TotalSteps_CV VeryActiveMinutes_mean VeryActiveMinutes_sd VeryActiveMinutes_CV FairlyActiveMinutes_mean FairlyActiveMinutes_sd FairlyActiveMinutes_CV LightlyActiveMinutes_mean LightlyActiveMinutes_sd LightlyActiveMinutes_CV Calories_mean Calories_sd Calories_CV SedentaryMinutes_mean SedentaryMinutes_sd SedentaryMinutes_CV No_of_people
Consistently Active 11037.886 5037.274 0.4563622 46.122622 38.89035 0.8431948 20.581395 21.11626 1.025988 202.1607 78.30357 0.3873333 2512.562 839.4916 0.3341177 861.277 261.3623 0.3034591 11
Ocassionally Active 6726.277 4137.683 0.6151521 9.580220 16.29173 1.7005586 15.562637 21.75069 1.397622 188.8879 120.79497 0.6395061 2384.066 710.4784 0.2980112 1037.976 306.0314 0.2948348 11
Inactive 4141.190 3701.141 0.8937384 2.811947 10.90912 3.8795612 2.747788 10.43948 3.799232 167.5686 127.96586 0.7636626 1899.836 518.5674 0.2729537 1076.909 319.4748 0.2966590 11

INSIGHTS

Out of 33 individuals only 11 are consistently active.

Group_summary %>% 
  ggplot(aes(x=reorder(group, -No_of_people), y=No_of_people)) + geom_col() + labs(x='group')

TOTAL STEPS

Consistently active individuals are moving around twice then occasionally active and thrice then inactive individuals.

Group_summary %>% 
  ggplot(aes(x=reorder(group, -TotalSteps_mean), y=TotalSteps_mean)) + geom_col() + labs(x='group')

HIGH INTENSITY ACTIVITY

Very active minutes are exponentially high in consistently active individuals.

MODERATE INTENSITY ACTIVITY

Moderate activity levels are exponentially low in Inactive individuals

LOW INTENSITY ACTIVITY

light activity minutes decreased with decreasing activity levels but not much difference

SEDENTARY TIME

sedentary minutes are same of occasionally active and inactive and lower of consistently active

HEART POINTS

Average weekly heart points for consistently active individuals is exponentially high (more then 4 times the recommended minimum activity levels)

Average weekly HP distribution within groups

  • for consistently active users median of avg weekly HP is around 600 and goes as high as around 1200
activity_status %>% 
  filter(group == 'Consistently Active') %>% 
  ggplot(aes(y=avg_weekly_hp)) + geom_boxplot()

  • For occasionally active users this median drops to 200 which is more then 150 but they are not consistently reaching 150+ across weeks, some weeks they are more active and some weeks less

  • Inactive users are not fulfilling the minimum activity guidelines

ENGAGEMENT ANALYSIS

23/33 people didn’t even log distance once

engagement <- dailyActivity_DateCleaned %>% 
  group_by(Id) %>% 
  summarise(No_of_times_logged = sum(ifelse(LoggedActivitiesDistance>0,1,0))) %>% 
  group_by(No_of_times_logged) %>% 
  summarise(No_of_people = n())
knitr::kable(engagement)
No_of_times_logged No_of_people
0 28
1 2
7 1
19 1
23 1