Bellabeat Case Study (with R)

Background

There are two key person at the company: - Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer - Sando Mur: Mathematician and Bellabeat’s co-founder

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

Ask:

1. What are some trends in smart device usage?

According to the article published on August 29, 2024, https://psico-smart.com/en/blogs/blog-wearable-technology-trends-and-innovations-in-health-and-wellness-tracking-173308

Companies like Fitbit and Apple have become household names, with studies showing that 31% of U.S. adults now own a smartwatch or fitness tracker. A 2021 study published in the Journal of Medical Internet Research revealed that users of wearable devices demonstrated a 30% increase in physical activity levels and a 25% reduction in healthcare costs through proactive health monitoring. With features like ECG readings, sleep tracking, and even blood oxygen level monitoring, wearables enable users to be more informed about their health than ever before.

The integration of Artificial Intelligence (AI) enhances their capabilities exponentially. Such as smartwatches can now learn from user behavior to provide personalized health insights. 74% of wearable technology users reported improved health awareness, highlighting their potential to transform our approach to fitness and wellness. a survey revealing that 60% of users who tracked their sleep reported improved sleep quality.

Let’s find out by analysing data.

2. How could these trends apply to Bellabeat customers?

The data I use include daily average information about total steps taken, distance, activity level by minute, calories, minutes a sleep, time in bed and weight in kg.

3. How could these trends help influence Bellabeat marketing strategy?

Please see the conclusion at the end.

Prepare:

Download data from FitBit Fitness Tracker Data https://www.kaggle.com/datasets/arashnic/fitbit/data Uploaded filed (zip file - 4.12.16-5.12.16) to R Studio

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr)
dailyActivity_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")

## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(dailyActivity_merged)

library(readr)
dailyCalories_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")

## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(dailyCalories_merged)

library(readr)
dailyIntensities_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")

## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(dailyIntensities_merged)

library(readr)
dailySteps_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")

## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(dailySteps_merged)

library(readr)
heartrate_seconds_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")

## Rows: 2483658 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(heartrate_seconds_merged)

library(readr)
hourlyCalories_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")

## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(hourlyCalories_merged)

library(readr)
hourlyIntensities_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")

## Rows: 22099 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (3): Id, TotalIntensity, AverageIntensity
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(hourlyIntensities_merged)

library(readr)
hourlySteps_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")

## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(hourlySteps_merged)

library(readr)
minuteCaloriesNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteCaloriesNarrow_merged.csv")

## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteCaloriesNarrow_merged)

library(readr)
minuteCaloriesWide_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteCaloriesWide_merged.csv")

## Rows: 21645 Columns: 62
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityHour
## dbl (61): Id, Calories00, Calories01, Calories02, Calories03, Calories04, Ca...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteCaloriesWide_merged)

library(readr)
minuteIntensitiesNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteIntensitiesNarrow_merged.csv")

## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Intensity
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteIntensitiesNarrow_merged)

library(readr)
minuteIntensitiesWide_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteIntensitiesWide_merged.csv")

## Rows: 21645 Columns: 62
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityHour
## dbl (61): Id, Intensity00, Intensity01, Intensity02, Intensity03, Intensity0...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteIntensitiesWide_merged)

library(readr)
minuteMETsNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteMETsNarrow_merged.csv")

## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, METs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteMETsNarrow_merged)

library(readr)
minuteSleep_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteSleep_merged.csv")

## Rows: 188521 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): date
## dbl (3): Id, value, logId
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteSleep_merged)

library(readr)
minuteStepsNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteStepsNarrow_merged.csv")

## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Steps
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteStepsNarrow_merged)

library(readr)
minuteStepsNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteStepsNarrow_merged.csv")

## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Steps
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(minuteStepsNarrow_merged)

library(readr)
sleepDay_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")

## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(sleepDay_merged)

library(readr)
weightLogInfo_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(weightLogInfo_merged)

Check what information each data has (by column names)

colnames(dailyActivity_merged)

##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

colnames(dailyCalories_merged)

## [1] "Id"          "ActivityDay" "Calories"

DailyCalories_merged is in dailyActivity_merged

Erase dailyCalories_merged from the list

Continue checking the information of each data (by column names)

colnames(dailyIntensities_merged)

##  [1] "Id"                       "ActivityDay"             
##  [3] "SedentaryMinutes"         "LightlyActiveMinutes"    
##  [5] "FairlyActiveMinutes"      "VeryActiveMinutes"       
##  [7] "SedentaryActiveDistance"  "LightActiveDistance"     
##  [9] "ModeratelyActiveDistance" "VeryActiveDistance"

DailyIntensities_merged is in dailyActivity_merged

Erase dailyIntensities_merged from the list

Continue checking the information of each data (by column names)

colnames(dailySteps_merged)

## [1] "Id"          "ActivityDay" "StepTotal"

DailySteps_merged is in dailyActivity_merged

Erase dailySteps_merged from the list

Continue checking the information of each data (by column names)

colnames(heartrate_seconds_merged)

## [1] "Id"    "Time"  "Value"

Heartrate_seconds_merged data is too detailed (by seconds), for this analysis, I will skip this data

Opened “hourly” and “minute” data

Both “Narrow” and “Wide” data are same in different format

At this time, I will skip these detailed data to focus on overall data

“MET” of minuteMETsNarrow-merged data means ““One metabolic equivalent (MET) is defined as the amount of oxygen consumed while sitting at rest and is equal to 3.5 ml O2 per kg body weight x min.” by Google search

Erase above datasets and skip from using them to make it easier to organize

Now there are only three data sets remaining, dailyActivity_merged, sleepDay-merged, weightLogInfo-merged

To make it easier, rename each datasets

dailyActivity <- dailyActivity_merged
sleepDay <- sleepDay_merged
weightLog <- weightLogInfo_merged

To make it easier to see, erase “_merged” files

Clean column names to make them standard

library("janitor")

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

dailyActivity <- clean_names(dailyActivity)
sleepDay <- clean_names(sleepDay)
weightLog <- clean_names(weightLog)

Confirm cleaned column names

colnames(dailyActivity)

##  [1] "id"                         "activity_date"             
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "logged_activities_distance"
##  [7] "very_active_distance"       "moderately_active_distance"
##  [9] "light_active_distance"      "sedentary_active_distance" 
## [11] "very_active_minutes"        "fairly_active_minutes"     
## [13] "lightly_active_minutes"     "sedentary_minutes"         
## [15] "calories"

colnames(sleepDay)

## [1] "id"                   "sleep_day"            "total_sleep_records" 
## [4] "total_minutes_asleep" "total_time_in_bed"

colnames(weightLog)

## [1] "id"               "date"             "weight_kg"        "weight_pounds"   
## [5] "fat"              "bmi"              "is_manual_report" "log_id"

For “date” to be consistent, let’s change “activity”date” in dailyActivity and “sleep_day” in sleepDay to “date”

install.packages('dplyr', repos='http://cran.us.r-project.org')

## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages

library("dplyr")

dailyActivity <- dailyActivity %>% 
  rename(date = activity_date)
sleepDay <- sleepDay %>%
  rename(date = sleep_day)

In dailyActiity, found inconsistency with the column name “moderate” and “fairly”

To be consistent, change “fairly” to “moderate”

Also, there are “sedentary_active_distance” and “sedentary_minutes”,

To be consistent, change “sedentary_distance” from “sedentary_active_distance” to match “sedentary_minutes”

dailyActivity <- dailyActivity %>% 
  rename(moderately_active_minutes = fairly_active_minutes, sedentary_distance = sedentary_active_distance)

Standardize the date format

install.packages("lubridate", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages

library("lubridate")

dailyActivity$date = as.Date(dailyActivity$date, format('%m/%d/%Y'))
View(dailyActivity)

SleepDay and weightLog has date AND time. We only need date for this analysis.

Convert dt column to date

sleepDay$date = as.Date(sleepDay$date, format('%m/%d/%Y'))
View(sleepDay)
weightLog$date = as.Date(weightLog$date, format('%m/%d/%Y'))
View(weightLog)

Let’s check what kind of data we have in each datasets again using columnames()

colnames(dailyActivity)

##  [1] "id"                         "date"                      
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "logged_activities_distance"
##  [7] "very_active_distance"       "moderately_active_distance"
##  [9] "light_active_distance"      "sedentary_distance"        
## [11] "very_active_minutes"        "moderately_active_minutes" 
## [13] "lightly_active_minutes"     "sedentary_minutes"         
## [15] "calories"

colnames(sleepDay)

## [1] "id"                   "date"                 "total_sleep_records" 
## [4] "total_minutes_asleep" "total_time_in_bed"

colnames(weightLog)

## [1] "id"               "date"             "weight_kg"        "weight_pounds"   
## [5] "fat"              "bmi"              "is_manual_report" "log_id"

In dailyActivity, we are not 100% sure whether the distance are in “mile” or in “meter”

So let’s take away all “distance” columns and focus on “minutes”

In colnames() output, we don’t need columns 5 (keep “total_distance”just in case) 5 to 10

dailyActivity_2 <- dailyActivity[, -c(5:10)]

In sleepDay, I don’t know what is “total_sleep_records” means. So let’s take away this column.

sleepDay_2 <- sleepDay[, -c(3)]

In weightLog, let’s focus weight in “kg” and take away “weight_pounds”, “fat”, “bmi”, “is_manual_report” and “log_id”.

weightLog_2 <- weightLog[, c(1:3)]

Next, let’s calculate average by Id.

install.packages("dplyr", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages

library("dplyr")

dailyActivity_avg_by_id <- dailyActivity_2 %>%
  group_by(id) %>%
  reframe(avg_ttl_steps = mean(total_steps),
          avg_distance = mean(total_distance),
          avg_very_act_min = mean(very_active_minutes),
          avg_mod_act_min = mean(moderately_active_minutes),
          avg_light_act_min = mean(lightly_active_minutes),
          avg_sedentary_min = mean(sedentary_minutes),
          ave_cal = mean(calories))   

sleepDay_avg_by_id <- sleepDay_2 %>%
  group_by(id) %>%
  reframe(avg_minutes_asleep = mean(total_minutes_asleep),
          avg_time_in_bed = mean(total_time_in_bed))  

weightLog_avg_by_id <- weightLog_2 %>%
  group_by(id) %>%
  reframe(avg_weight_kg = mean(weight_kg))

To make it easy to read, round values except ‘id’ column to 2 decimal places.

library(dplyr)

dailyActivity_avg_by_id_2 <- dailyActivity_avg_by_id %>% mutate(across(-c('id'), round, 2))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(-c("id"), round, 2)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

sleepDay_avg_by_id_2 <- sleepDay_avg_by_id %>% mutate(across(-c('id'), round, 2))
weightLog_avg_by_id_2 <- weightLog_avg_by_id %>% mutate(across(-c('id'), round, 2))

Let’s check the data type in each column.

glimpse(dailyActivity_avg_by_id_2)

## Rows: 33
## Columns: 8
## $ id                <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927…
## $ avg_ttl_steps     <dbl> 12116.74, 5743.90, 7282.97, 2580.06, 916.13, 11370.6…
## $ avg_distance      <dbl> 7.81, 3.91, 5.30, 1.71, 0.63, 8.08, 3.45, 3.19, 6.36…
## $ avg_very_act_min  <dbl> 38.71, 8.68, 9.57, 0.13, 1.32, 36.29, 0.10, 1.35, 13…
## $ avg_mod_act_min   <dbl> 19.16, 5.81, 21.37, 1.29, 0.77, 19.35, 0.26, 2.58, 2…
## $ avg_light_act_min <dbl> 219.94, 153.48, 178.47, 115.45, 38.58, 257.45, 256.6…
## $ avg_sedentary_min <dbl> 848.16, 1257.74, 1161.87, 1206.61, 1317.42, 1112.58,…
## $ ave_cal           <dbl> 1816.42, 1483.35, 2811.30, 1573.48, 2172.81, 2509.97…

glimpse(sleepDay_avg_by_id)

## Rows: 24
## Columns: 3
## $ id                 <dbl> 1503960366, 1644430081, 1844505072, 1927972279, 202…
## $ avg_minutes_asleep <dbl> 360.2800, 294.0000, 652.0000, 417.0000, 506.1786, 6…
## $ avg_time_in_bed    <dbl> 383.2000, 346.0000, 961.0000, 437.8000, 537.6429, 6…

glimpse(weightLog_avg_by_id)

## Rows: 8
## Columns: 2
## $ id            <dbl> 1503960366, 1927972279, 2873212765, 4319703577, 45586099…
## $ avg_weight_kg <dbl> 52.60000, 133.50000, 57.00000, 72.35000, 69.64000, 90.70…

Finally, let’s merge all above three data files

Joined_AS_Avg <- left_join(dailyActivity_avg_by_id_2,sleepDay_avg_by_id_2)

## Joining with `by = join_by(id)`

View(Joined_AS_Avg)
Joined_ASW_Avg <- left_join(Joined_AS_Avg, weightLog_avg_by_id_2)

## Joining with `by = join_by(id)`

View(Joined_ASW_Avg)
colnames(Joined_ASW_Avg)

##  [1] "id"                 "avg_ttl_steps"      "avg_distance"      
##  [4] "avg_very_act_min"   "avg_mod_act_min"    "avg_light_act_min" 
##  [7] "avg_sedentary_min"  "ave_cal"            "avg_minutes_asleep"
## [10] "avg_time_in_bed"    "avg_weight_kg"

Analyze and Share

Let’s sort top to down avg_cal column to see what is the highest and lowest.

library("tidyverse")

ggplot(data = Joined_ASW_Avg, aes(x = ave_cal, y = avg_ttl_steps)) + 
  geom_point() +
  ggtitle("Co-relationship of Average Calories Burnt and Total Steps Taken", subtitle = "(Per Day)") +
  xlab("Average Calories Burnt") +
  ylab("Average Number of Average Total Steps") +
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(data = Joined_ASW_Avg, aes(x = ave_cal, y = avg_distance)) + 
  geom_point() +
  ggtitle("Co-relationship of Average Calories Burnt and Average Distance Walked", subtitle = "(Per Day)") +
  xlab("Average Calories Burnt") +
  ylab("Average Distance Walked (km)") +
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(data = Joined_ASW_Avg, aes(x = ave_cal, y = avg_very_act_min)) + 
  geom_point() +
  ggtitle("Co-relationship of Average Calories Burnt and Average Very Active Minutes", subtitle = "(Per Day)") +
  xlab("Average Calories Burnt") +
  ylab("Average Very Active Minutes") +
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

To see how people are burning calories, I made three visualizations to compare the relationship between “Average Calories Burnt per day” to “Average Total Steps taken”, “Average Distance Walked” and “Average Very Active Minutes”.

It was unexpected and interesting to see a dip at “Average Calories Burn” between 2,000cal - 2,250cal. My assumption of this dip is that the sample size is 33 participants which is very small and each body burns calories at a different pace. I would assume if the number of participants are larger, this dip will smooth out. However, overall, the smooth line shows an upward line which means if you take more steps, walk more distance and have more very active minutes, you will burn more calories.

“Burning calories” has been a popular subject among women. To differentiate Bellabeat products and to go a step further than other smart devices, I suggest to create a mechanism to analyze collected data, select focus of each individual such as “number of steps”, “distance” or “activity level”, and device can give recommendations and suggestions for improvement. This would be a good opportunity to use the benefit of AI.

Recommended sleep hours is 7 - 9 hours which are 420 - 540 mins.

To make it easier to see in the bar chart, mark recommended hours in light blue (in minutes)

install.packages("ggplot2", repos = "http://cran.us.r-project.org")

## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages

library("ggplot2")

ggplot(data = sleepDay_avg_by_id_2, aes(x = id, y = avg_minutes_asleep)) + 
  geom_hline(yintercept = 420:540, colour = "light blue") +
  geom_col() +
  ggtitle("Average Sleep per Day by ID") +
  xlab("ID") +
  ylab("Average Minutes Asleep")

Count how many id had equal or above recommended hours (minutes) of sleep

sum(sleepDay_avg_by_id_2$avg_minutes_asleep >= 420)

## [1] 12

After visualizaing “SleepDay-avg_by_id_2”, count the number of ids with average of equal or above recommended sleep time (in minutes).

Among a total of 24 ids, there are 12 ids with average of equal or above recommended sleep time which are half of the participants.

There is the trend that says there are so many people who are not getting enough sleep. I recommend creating an alert function to inform users when they are not getting enough sleep.

Act:

As I shared my findings, I have two recommendations to help influence Bellabeat marketing strategy. I think it is better to focus on what users most want rather than trying to suggest too many options.

1. Calories Burning

To suggest effective “calorie burning” for users, the device can let users choose what their focus is among “number of steps”, “distance” or “activity level”. Then, use AI to collect their information to inform helpful suggestions depending on the user’s focus to improve their “calorie burning”.

2. Sleep Time

I recommend creating an alert function to inform users when they are not getting enough sleep.

Bellabeat Case Study (with R)

Maki

2024-12-02

Introduction

Background

Ask:

1. What are some trends in smart device usage?

2. How could these trends apply to Bellabeat customers?

3. How could these trends help influence Bellabeat marketing strategy?

Prepare:

Act:

1. Calories Burning

2. Sleep Time