will produce a repo with the following deliverables: [DATE :- 18-01-2026]

  1. A clear summary of the business task
  2. A description of all data sources used
  3. Documentation of any cleaning or manipulation of data
  4. A summary of your analysis
  5. Supporting visualizations and key findings
  6. Your top high-level content recommendations based on your analysis

ASK PHASE

Business Task :-

“ Analyze smart-device usage patterns from available fitness tracker data to identify trends and insights that can guide Bellabeat’s product development and marketing strategy for non-smart wellness products. “

Stake holders :-

  1. UrškaSršen : Bellabeat’s cofounder and Chief Creative Office 2. SandoMur: Mathematician and Bellabeat’s cofounder

PREPARE

DATA IMPORTED FROM :-

  1. Public data_set :- fitbit fitness tracker data đź”—(https://www.kaggle.com/datasets/arashnic/fitbit)
  2. Combined daily_activity and sleep-day data for good understanding

importing data

fitbase_1 <- read.csv("~/coursera case study project/case stdy 2 _f/mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")

fitbase_2 <- read.csv("~/coursera case study project/case stdy 2 _f/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")

colnames(fitbase_1) 
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

arranging data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
fitbase_1 <- fitbase_1 %>%
  mutate(ActivityDate = as.Date(ActivityDate, format = "%m/%d/%Y"))

fitbase_2 <- fitbase_2 %>%
  mutate(SleepDay = as.Date(SleepDay, format = "%m/%d/%Y %I:%M:%S %p")) %>%
  rename(ActivityDate = SleepDay)

fitbase <- left_join(fitbase_1, fitbase_2, by = c("Id", "ActivityDate"))

glimpse(fitbase)
## Rows: 457
## Columns: 18
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate             <date> 2016-03-25, 2016-03-26, 2016-03-27, 2016-03-…
## $ TotalSteps               <int> 11004, 17609, 12736, 13231, 12041, 10970, 122…
## $ TotalDistance            <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ TrackerDistance          <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.3…
## $ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.8…
## $ LightActiveDistance      <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.6…
## $ SedentaryActiveDistance  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.0…
## $ VeryActiveMinutes        <int> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 3…
## $ FairlyActiveMinutes      <int> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18…
## $ LightlyActiveMinutes     <int> 205, 274, 268, 224, 243, 223, 239, 200, 244, …
## $ SedentaryMinutes         <int> 804, 588, 605, 1080, 763, 1174, 820, 866, 636…
## $ Calories                 <int> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 186…
## $ TotalSleepRecords        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalMinutesAsleep       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalTimeInBed           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Glimps in r programming  👇
glimpse(fitbase)
## Rows: 457
## Columns: 18
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate             <date> 2016-03-25, 2016-03-26, 2016-03-27, 2016-03-…
## $ TotalSteps               <int> 11004, 17609, 12736, 13231, 12041, 10970, 122…
## $ TotalDistance            <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ TrackerDistance          <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.3…
## $ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.8…
## $ LightActiveDistance      <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.6…
## $ SedentaryActiveDistance  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.0…
## $ VeryActiveMinutes        <int> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 3…
## $ FairlyActiveMinutes      <int> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18…
## $ LightlyActiveMinutes     <int> 205, 274, 268, 224, 243, 223, 239, 200, 244, …
## $ SedentaryMinutes         <int> 804, 588, 605, 1080, 763, 1174, 820, 866, 636…
## $ Calories                 <int> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 186…
## $ TotalSleepRecords        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalMinutesAsleep       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TotalTimeInBed           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

PREPARE

Date : 19-1-26â–Ľ Tools Used 1. My SQL 2. R (Programming) 3. Tableau 4. Excel

  â–ĽDividing process into two steps  :- 
  
                                                            🔹 Part 1: Activity-based behavior (ALL USERS)
                                                            •   Steps
                                                            •   Calories
                                                            •   Active minutes
                                                            •   Sedentary time                                                      🔹  Part 2: Sleep-based behavior (ONLY USERS WITH SLEEP DATA)
                                                            •   Filter users/days where sleep is available
                                                            •   Explicitly mention this limitation
                                                             19-01-2026 

sorting fitbase using arrange()function

sorted the tables

duplicate checks

library(dplyr)
dup_check <- fitbase %>%  count(Id,ActivityDate) %>% filter(n >1)
dup_check
## [1] Id           ActivityDate n           
## <0 rows> (or 0-length row.names)

Data Validation → checking data types

str(fitbase)
## 'data.frame':    457 obs. of  18 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : Date, format: "2016-03-25" "2016-03-26" ...
##  $ TotalSteps              : int  11004 17609 12736 13231 12041 10970 12256 12262 11248 10016 ...
##  $ TotalDistance           : num  7.11 11.55 8.53 8.93 7.85 ...
##  $ TrackerDistance         : num  7.11 11.55 8.53 8.93 7.85 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  2.57 6.92 4.66 3.19 2.16 ...
##  $ ModeratelyActiveDistance: num  0.46 0.73 0.16 0.79 1.09 ...
##  $ LightActiveDistance     : num  4.07 3.91 3.71 4.95 4.61 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  33 89 56 39 28 30 33 47 40 15 ...
##  $ FairlyActiveMinutes     : int  12 17 5 20 28 13 12 21 11 30 ...
##  $ LightlyActiveMinutes    : int  205 274 268 224 243 223 239 200 244 314 ...
##  $ SedentaryMinutes        : int  804 588 605 1080 763 1174 820 866 636 655 ...
##  $ Calories                : int  1819 2154 1944 1932 1886 1820 1889 1868 1843 1850 ...
##  $ TotalSleepRecords       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ TotalMinutesAsleep      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ TotalTimeInBed          : int  NA NA NA NA NA NA NA NA NA NA ...

21-01-2026 [ Analyzis using R ]

library(dplyr)
user_summary <- fitbase %>% group_by(Id) %>% 
  summarise(avg_steps = mean(TotalSteps ,na.rm = TRUE),
            avg_calories = mean(Calories , na.rm = TRUE),
            avg_sedimentary_minutes = mean(SedentaryMinutes , na.rm = TRUE),
            avg_active_minutes = mean(VeryActiveMinutes +FairlyActiveMinutes +LightlyActiveMinutes))

str(user_summary)
## tibble [35 Ă— 5] (S3: tbl_df/tbl/data.frame)
##  $ Id                     : num [1:35] 1.50e+09 1.62e+09 1.64e+09 1.84e+09 1.93e+09 ...
##  $ avg_steps              : num [1:35] 11641 4226 9275 3641 2181 ...
##  $ avg_calories           : num [1:35] 1796 1353 2916 1616 2254 ...
##  $ avg_sedimentary_minutes: num [1:35] 810 1278 1034 1035 953 ...
##  $ avg_active_minutes     : num [1:35] 280 122 286 160 113 ...
View(user_summary)

Visualization

library(ggplot2)

A. Average steps per user

ggplot(user_summary,
       aes(x = factor(Id), y = avg_steps )) +
  geom_col() +
  labs(
    title = "Average Daily Steps per User",
    x = "User ID",
    y = "Average Steps"
  ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

B. Sedimentary time vs steps

ggplot(user_summary, aes(x = avg_steps, y = avg_sedimentary_minutes)) +
  geom_point() +
  labs(
    title = "Sedimentary Time vs Average Steps",
    x = "Average Steps",
    y = "Average Sedentary Minutes"
  )

C. Calories Vs Activity

ggplot(user_summary, aes(x = avg_active_minutes, y = avg_calories)) +
  geom_point() +
  labs(
    title = "Active Minutes vs Calories Burned",
    x = "Average Active Minutes",
    y = "Average Calories"
  )

🚀 BUSINESS RECOMMENDATIONS

âś… Recommendation 1: Habit-building over performance Since most users show moderate activity with high sedentary time, Bellabeat should emphasize daily habit formation rather than intense fitness goals.

âś… Recommendation 2: Non-smart product positioning Non-smart wellness products (journals, hydration reminders, mindfulness tools) should target sedentary users, positioning wellness as achievable without technology overload.

✅ Recommendation 3: Marketing messaging Campaigns should focus on “small daily progress”, aligning with users who are not highly active but consistently engaged.

âś… Recommendation 4: Sleep features (future opportunity) Given the limited sleep data, Bellabeat can differentiate by encouraging manual sleep reflection in non-smart products to complement wearable insights.