Introduction

Bellabeat is a high-tech manufacturer of health-focused products for women. To help guide future marketing strategies for your team, I will analyze smart device data to gain insight into how consumers are using their smart devices.

Background

There are two key person at the company: - Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer - Sando Mur: Mathematician and Bellabeat’s co-founder

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

Ask:

Prepare:

Download data from FitBit Fitness Tracker Data https://www.kaggle.com/datasets/arashnic/fitbit/data Uploaded filed (zip file - 4.12.16-5.12.16) to R Studio

library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
dailyActivity_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(dailyActivity_merged)
library(readr)
dailyCalories_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(dailyCalories_merged)
library(readr)
dailyIntensities_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(dailyIntensities_merged)
library(readr)
dailySteps_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(dailySteps_merged)
library(readr)
heartrate_seconds_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
## Rows: 2483658 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(heartrate_seconds_merged)
library(readr)
hourlyCalories_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hourlyCalories_merged)
library(readr)
hourlyIntensities_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
## Rows: 22099 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (3): Id, TotalIntensity, AverageIntensity
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hourlyIntensities_merged)
library(readr)
hourlySteps_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hourlySteps_merged)
library(readr)
minuteCaloriesNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteCaloriesNarrow_merged.csv")
## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteCaloriesNarrow_merged)
library(readr)
minuteCaloriesWide_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteCaloriesWide_merged.csv")
## Rows: 21645 Columns: 62
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityHour
## dbl (61): Id, Calories00, Calories01, Calories02, Calories03, Calories04, Ca...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteCaloriesWide_merged)
library(readr)
minuteIntensitiesNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteIntensitiesNarrow_merged.csv")
## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Intensity
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteIntensitiesNarrow_merged)
library(readr)
minuteIntensitiesWide_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteIntensitiesWide_merged.csv")
## Rows: 21645 Columns: 62
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityHour
## dbl (61): Id, Intensity00, Intensity01, Intensity02, Intensity03, Intensity0...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteIntensitiesWide_merged)
library(readr)
minuteMETsNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteMETsNarrow_merged.csv")
## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, METs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteMETsNarrow_merged)
library(readr)
minuteSleep_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteSleep_merged.csv")
## Rows: 188521 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): date
## dbl (3): Id, value, logId
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteSleep_merged)
library(readr)
minuteStepsNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteStepsNarrow_merged.csv")
## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Steps
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteStepsNarrow_merged)
library(readr)
minuteStepsNarrow_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/minuteStepsNarrow_merged.csv")
## Rows: 1325580 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, Steps
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(minuteStepsNarrow_merged)
library(readr)
sleepDay_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(sleepDay_merged)
library(readr)
weightLogInfo_merged <- read_csv("~/Downloads/archive (3)/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(weightLogInfo_merged)

Check what information each data has (by column names)

colnames(dailyActivity_merged)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"
colnames(dailyCalories_merged)
## [1] "Id"          "ActivityDay" "Calories"

DailyCalories_merged is in dailyActivity_merged

Erase dailyCalories_merged from the list

Continue checking the information of each data (by column names)

colnames(dailyIntensities_merged)
##  [1] "Id"                       "ActivityDay"             
##  [3] "SedentaryMinutes"         "LightlyActiveMinutes"    
##  [5] "FairlyActiveMinutes"      "VeryActiveMinutes"       
##  [7] "SedentaryActiveDistance"  "LightActiveDistance"     
##  [9] "ModeratelyActiveDistance" "VeryActiveDistance"

DailyIntensities_merged is in dailyActivity_merged

Erase dailyIntensities_merged from the list

Continue checking the information of each data (by column names)

colnames(dailySteps_merged)
## [1] "Id"          "ActivityDay" "StepTotal"

DailySteps_merged is in dailyActivity_merged

Erase dailySteps_merged from the list

Continue checking the information of each data (by column names)

colnames(heartrate_seconds_merged)
## [1] "Id"    "Time"  "Value"

Heartrate_seconds_merged data is too detailed (by seconds), for this analysis, I will skip this data

Opened “hourly” and “minute” data

Both “Narrow” and “Wide” data are same in different format

At this time, I will skip these detailed data to focus on overall data

“MET” of minuteMETsNarrow-merged data means ““One metabolic equivalent (MET) is defined as the amount of oxygen consumed while sitting at rest and is equal to 3.5 ml O2 per kg body weight x min.” by Google search

Erase above datasets and skip from using them to make it easier to organize

Now there are only three data sets remaining, dailyActivity_merged, sleepDay-merged, weightLogInfo-merged

To make it easier, rename each datasets

dailyActivity <- dailyActivity_merged
sleepDay <- sleepDay_merged
weightLog <- weightLogInfo_merged

To make it easier to see, erase “_merged” files

Clean column names to make them standard

library("janitor")
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
dailyActivity <- clean_names(dailyActivity)
sleepDay <- clean_names(sleepDay)
weightLog <- clean_names(weightLog)

Confirm cleaned column names

colnames(dailyActivity)
##  [1] "id"                         "activity_date"             
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "logged_activities_distance"
##  [7] "very_active_distance"       "moderately_active_distance"
##  [9] "light_active_distance"      "sedentary_active_distance" 
## [11] "very_active_minutes"        "fairly_active_minutes"     
## [13] "lightly_active_minutes"     "sedentary_minutes"         
## [15] "calories"
colnames(sleepDay)
## [1] "id"                   "sleep_day"            "total_sleep_records" 
## [4] "total_minutes_asleep" "total_time_in_bed"
colnames(weightLog)
## [1] "id"               "date"             "weight_kg"        "weight_pounds"   
## [5] "fat"              "bmi"              "is_manual_report" "log_id"

For “date” to be consistent, let’s change “activity”date” in dailyActivity and “sleep_day” in sleepDay to “date”

install.packages('dplyr', repos='http://cran.us.r-project.org')
## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages
library("dplyr")

dailyActivity <- dailyActivity %>% 
  rename(date = activity_date)
sleepDay <- sleepDay %>%
  rename(date = sleep_day)

In dailyActiity, found inconsistency with the column name “moderate” and “fairly”

To be consistent, change “fairly” to “moderate”

Also, there are “sedentary_active_distance” and “sedentary_minutes”,

To be consistent, change “sedentary_distance” from “sedentary_active_distance” to match “sedentary_minutes”

dailyActivity <- dailyActivity %>% 
  rename(moderately_active_minutes = fairly_active_minutes, sedentary_distance = sedentary_active_distance)

Standardize the date format

install.packages("lubridate", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages
library("lubridate")

dailyActivity$date = as.Date(dailyActivity$date, format('%m/%d/%Y'))
View(dailyActivity)

SleepDay and weightLog has date AND time. We only need date for this analysis.

Convert dt column to date

sleepDay$date = as.Date(sleepDay$date, format('%m/%d/%Y'))
View(sleepDay)
weightLog$date = as.Date(weightLog$date, format('%m/%d/%Y'))
View(weightLog)

Let’s check what kind of data we have in each datasets again using columnames()

colnames(dailyActivity)
##  [1] "id"                         "date"                      
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "logged_activities_distance"
##  [7] "very_active_distance"       "moderately_active_distance"
##  [9] "light_active_distance"      "sedentary_distance"        
## [11] "very_active_minutes"        "moderately_active_minutes" 
## [13] "lightly_active_minutes"     "sedentary_minutes"         
## [15] "calories"
colnames(sleepDay)
## [1] "id"                   "date"                 "total_sleep_records" 
## [4] "total_minutes_asleep" "total_time_in_bed"
colnames(weightLog)
## [1] "id"               "date"             "weight_kg"        "weight_pounds"   
## [5] "fat"              "bmi"              "is_manual_report" "log_id"

In dailyActivity, we are not 100% sure whether the distance are in “mile” or in “meter”

So let’s take away all “distance” columns and focus on “minutes”

In colnames() output, we don’t need columns 5 (keep “total_distance”just in case) 5 to 10

dailyActivity_2 <- dailyActivity[, -c(5:10)]

In sleepDay, I don’t know what is “total_sleep_records” means. So let’s take away this column.

sleepDay_2 <- sleepDay[, -c(3)]

In weightLog, let’s focus weight in “kg” and take away “weight_pounds”, “fat”, “bmi”, “is_manual_report” and “log_id”.

weightLog_2 <- weightLog[, c(1:3)]

Next, let’s calculate average by Id.

install.packages("dplyr", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages
library("dplyr")

dailyActivity_avg_by_id <- dailyActivity_2 %>%
  group_by(id) %>%
  reframe(avg_ttl_steps = mean(total_steps),
          avg_distance = mean(total_distance),
          avg_very_act_min = mean(very_active_minutes),
          avg_mod_act_min = mean(moderately_active_minutes),
          avg_light_act_min = mean(lightly_active_minutes),
          avg_sedentary_min = mean(sedentary_minutes),
          ave_cal = mean(calories))   

sleepDay_avg_by_id <- sleepDay_2 %>%
  group_by(id) %>%
  reframe(avg_minutes_asleep = mean(total_minutes_asleep),
          avg_time_in_bed = mean(total_time_in_bed))  

weightLog_avg_by_id <- weightLog_2 %>%
  group_by(id) %>%
  reframe(avg_weight_kg = mean(weight_kg)) 

To make it easy to read, round values except ‘id’ column to 2 decimal places.

library(dplyr)

dailyActivity_avg_by_id_2 <- dailyActivity_avg_by_id %>% mutate(across(-c('id'), round, 2))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(-c("id"), round, 2)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
sleepDay_avg_by_id_2 <- sleepDay_avg_by_id %>% mutate(across(-c('id'), round, 2))
weightLog_avg_by_id_2 <- weightLog_avg_by_id %>% mutate(across(-c('id'), round, 2))

Let’s check the data type in each column.

glimpse(dailyActivity_avg_by_id_2)
## Rows: 33
## Columns: 8
## $ id                <dbl> 1503960366, 1624580081, 1644430081, 1844505072, 1927…
## $ avg_ttl_steps     <dbl> 12116.74, 5743.90, 7282.97, 2580.06, 916.13, 11370.6…
## $ avg_distance      <dbl> 7.81, 3.91, 5.30, 1.71, 0.63, 8.08, 3.45, 3.19, 6.36…
## $ avg_very_act_min  <dbl> 38.71, 8.68, 9.57, 0.13, 1.32, 36.29, 0.10, 1.35, 13…
## $ avg_mod_act_min   <dbl> 19.16, 5.81, 21.37, 1.29, 0.77, 19.35, 0.26, 2.58, 2…
## $ avg_light_act_min <dbl> 219.94, 153.48, 178.47, 115.45, 38.58, 257.45, 256.6…
## $ avg_sedentary_min <dbl> 848.16, 1257.74, 1161.87, 1206.61, 1317.42, 1112.58,…
## $ ave_cal           <dbl> 1816.42, 1483.35, 2811.30, 1573.48, 2172.81, 2509.97…
glimpse(sleepDay_avg_by_id)
## Rows: 24
## Columns: 3
## $ id                 <dbl> 1503960366, 1644430081, 1844505072, 1927972279, 202…
## $ avg_minutes_asleep <dbl> 360.2800, 294.0000, 652.0000, 417.0000, 506.1786, 6…
## $ avg_time_in_bed    <dbl> 383.2000, 346.0000, 961.0000, 437.8000, 537.6429, 6…
glimpse(weightLog_avg_by_id)
## Rows: 8
## Columns: 2
## $ id            <dbl> 1503960366, 1927972279, 2873212765, 4319703577, 45586099…
## $ avg_weight_kg <dbl> 52.60000, 133.50000, 57.00000, 72.35000, 69.64000, 90.70…

Finally, let’s merge all above three data files

Joined_AS_Avg <- left_join(dailyActivity_avg_by_id_2,sleepDay_avg_by_id_2)
## Joining with `by = join_by(id)`
View(Joined_AS_Avg)
Joined_ASW_Avg <- left_join(Joined_AS_Avg, weightLog_avg_by_id_2)
## Joining with `by = join_by(id)`
View(Joined_ASW_Avg)
colnames(Joined_ASW_Avg)
##  [1] "id"                 "avg_ttl_steps"      "avg_distance"      
##  [4] "avg_very_act_min"   "avg_mod_act_min"    "avg_light_act_min" 
##  [7] "avg_sedentary_min"  "ave_cal"            "avg_minutes_asleep"
## [10] "avg_time_in_bed"    "avg_weight_kg"

Analyze and Share

Let’s sort top to down avg_cal column to see what is the highest and lowest.

library("tidyverse")

ggplot(data = Joined_ASW_Avg, aes(x = ave_cal, y = avg_ttl_steps)) + 
  geom_point() +
  ggtitle("Co-relationship of Average Calories Burnt and Total Steps Taken", subtitle = "(Per Day)") +
  xlab("Average Calories Burnt") +
  ylab("Average Number of Average Total Steps") +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(data = Joined_ASW_Avg, aes(x = ave_cal, y = avg_distance)) + 
  geom_point() +
  ggtitle("Co-relationship of Average Calories Burnt and Average Distance Walked", subtitle = "(Per Day)") +
  xlab("Average Calories Burnt") +
  ylab("Average Distance Walked (km)") +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(data = Joined_ASW_Avg, aes(x = ave_cal, y = avg_very_act_min)) + 
  geom_point() +
  ggtitle("Co-relationship of Average Calories Burnt and Average Very Active Minutes", subtitle = "(Per Day)") +
  xlab("Average Calories Burnt") +
  ylab("Average Very Active Minutes") +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

To see how people are burning calories, I made three visualizations to compare the relationship between “Average Calories Burnt per day” to “Average Total Steps taken”, “Average Distance Walked” and “Average Very Active Minutes”.

It was unexpected and interesting to see a dip at “Average Calories Burn” between 2,000cal - 2,250cal. My assumption of this dip is that the sample size is 33 participants which is very small and each body burns calories at a different pace. I would assume if the number of participants are larger, this dip will smooth out. However, overall, the smooth line shows an upward line which means if you take more steps, walk more distance and have more very active minutes, you will burn more calories.

“Burning calories” has been a popular subject among women. To differentiate Bellabeat products and to go a step further than other smart devices, I suggest to create a mechanism to analyze collected data, select focus of each individual such as “number of steps”, “distance” or “activity level”, and device can give recommendations and suggestions for improvement. This would be a good opportunity to use the benefit of AI.

Recommended sleep hours is 7 - 9 hours which are 420 - 540 mins.

To make it easier to see in the bar chart, mark recommended hours in light blue (in minutes)

install.packages("ggplot2", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/9w/zl26y91d2yndrh40yqf7_wxr0000gn/T//RtmpxtrMmT/downloaded_packages
library("ggplot2")

ggplot(data = sleepDay_avg_by_id_2, aes(x = id, y = avg_minutes_asleep)) + 
  geom_hline(yintercept = 420:540, colour = "light blue") +
  geom_col() +
  ggtitle("Average Sleep per Day by ID") +
  xlab("ID") +
  ylab("Average Minutes Asleep")

Count how many id had equal or above recommended hours (minutes) of sleep

sum(sleepDay_avg_by_id_2$avg_minutes_asleep >= 420)
## [1] 12

After visualizaing “SleepDay-avg_by_id_2”, count the number of ids with average of equal or above recommended sleep time (in minutes).

Among a total of 24 ids, there are 12 ids with average of equal or above recommended sleep time which are half of the participants.

There is the trend that says there are so many people who are not getting enough sleep. I recommend creating an alert function to inform users when they are not getting enough sleep.

Act:

As I shared my findings, I have two recommendations to help influence Bellabeat marketing strategy. I think it is better to focus on what users most want rather than trying to suggest too many options.

1. Calories Burning

To suggest effective “calorie burning” for users, the device can let users choose what their focus is among “number of steps”, “distance” or “activity level”. Then, use AI to collect their information to inform helpful suggestions depending on the user’s focus to improve their “calorie burning”.

2. Sleep Time

I recommend creating an alert function to inform users when they are not getting enough sleep.