Bellabeat is a high-tech manufacturer of health-focused products for women. While they are a successful small company, they have the potential to become a major player in the global smart device market. As a data analyst, I have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insights into how consumers are using their smart devices. The insights I uncover will help guide the company’s marketing strategy.
Questions
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat’s marketing strategy?
Business Tasks
Conduct an analysis of consumer behavior and usage patterns on FitBit smart devices in order to identify the key factors that shape trends and gain insights used to enhance Bellabeat’s marketing strategy.
For this case study, the stakeholder provided us with the datasets. We used the FitBit Fitness Tracker Data as our source of information. The dataset is available on Kaggle and was made accessible through Mobius.
This dataset was generated by respondents to a distributed survey via Amazon Mechanical Turk between 12/03/2016 and 12/05/2016. Thirty eligible Fitbit users consented to submit personal tracker data, including minute-level records of physical activity, heart rate, and sleep monitoring.
There are potential issues with bias and credibility in this dataset. Since it consists of only 30 users and lacks demographic information, there is a risk of sampling bias, making it uncertain whether the sample accurately represents the entire population. Additionally, the dataset is not up-to-date and was collected over a limited period of two months, which may impact its reliability. The small sample size and short data collection period could lead to skewed insights that may not be generalizable to a larger audience. Without a more diverse and comprehensive dataset, drawing meaningful and reliable conclusions becomes challenging.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” ggplot2 3.5.1 âś” tibble 3.2.1
## âś” lubridate 1.9.4 âś” tidyr 1.3.1
## âś” purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("janitor")
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library("lubridate")
library("ggplot2")
library("ggpubr")
library("tidyr")
library("dplyr")
daily_activity <- read.csv("fitabase4-5/dailyActivity_merged.csv")
daily_sleep <- read.csv("fitabase4-5/sleepDay_merged.csv")
hourly_intensities <- read.csv("fitabase4-5/hourlyIntensities_merged.csv")
hourly_calories <- read.csv("fitabase4-5/hourlyCalories_merged.csv")
# data_clean
data_clean_remove <- remove_empty(daily_activity, which = c("rows","cols"), quiet = FALSE)
## No empty rows to remove.
## No empty columns to remove.
# data_clean_remove
data_activityfiltered <- data_clean_remove %>% filter(LoggedActivitiesDistance!= 0)
data_activityfiltered <- data_activityfiltered %>%
distinct() %>%
drop_na()
# data_filtered
data_duplicate <- sum(duplicated(data_activityfiltered))
# duplicates
data_duplicate
## [1] 0
head(data_activityfiltered)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 6775888955 4/26/2016 7091 5.27 5.27
## 2 6962181067 4/21/2016 11835 9.71 7.88
## 3 6962181067 4/25/2016 13239 9.27 9.08
## 4 6962181067 5/9/2016 12342 8.72 8.68
## 5 7007744171 4/12/2016 14172 10.29 9.48
## 6 7007744171 4/13/2016 12862 9.65 8.60
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 1.959596 3.48 0.87
## 2 4.081692 3.99 2.10
## 3 2.785175 3.02 1.68
## 4 3.167822 3.90 1.18
## 5 4.869783 4.50 0.38
## 6 4.851307 4.61 0.56
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 0.73 0.00 42
## 2 3.51 0.11 53
## 3 4.46 0.10 35
## 4 3.65 0.00 43
## 5 5.41 0.00 53
## 6 4.48 0.00 56
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 30 47 1321 2584
## 2 27 214 708 2179
## 3 31 282 637 2194
## 4 21 231 607 2105
## 5 8 355 1024 2937
## 6 22 261 1101 2742
# data_clean
data1_clean_remove <- remove_empty(daily_sleep, which = c("rows","cols"), quiet = FALSE)
## No empty rows to remove.
## No empty columns to remove.
# data_clean_remove
data1_sleepfiltered <- data1_clean_remove %>% filter(TotalSleepRecords!= 0)
data1_sleepfiltered <- data1_sleepfiltered %>%
distinct() %>%
drop_na()
# data_filtered
data1_duplicate <- sum(duplicated(data1_sleepfiltered))
# duplicates
data1_duplicate
## [1] 0
head(data1_sleepfiltered)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
data2_clean <- clean_names(hourly_intensities)
# data_clean
data2_clean_remove <- remove_empty(data2_clean, which = c("rows","cols"), quiet = FALSE)
## No empty rows to remove.
## No empty columns to remove.
# data_clean_remove
data2_intensitiesfiltered <- hourly_intensities %>% filter(TotalIntensity!= 0)
data2_intensitiesfiltered <- data2_intensitiesfiltered %>%
distinct() %>%
drop_na()
# data_filtered
data2_duplicate <- sum(duplicated(data2_intensitiesfiltered))
# duplicates
data2_duplicate
## [1] 0
head(data2_intensitiesfiltered)
## Id ActivityHour TotalIntensity AverageIntensity
## 1 1503960366 4/12/2016 12:00:00 AM 20 0.333333
## 2 1503960366 4/12/2016 1:00:00 AM 8 0.133333
## 3 1503960366 4/12/2016 2:00:00 AM 7 0.116667
## 4 1503960366 4/12/2016 8:00:00 AM 13 0.216667
## 5 1503960366 4/12/2016 9:00:00 AM 30 0.500000
## 6 1503960366 4/12/2016 10:00:00 AM 29 0.483333
# data_clean
data3_clean <- clean_names(hourly_calories)
data3_clean_remove <- remove_empty(hourly_calories, which = c("rows","cols"), quiet = FALSE)
## No empty rows to remove.
## No empty columns to remove.
# data_clean_remove
data3_caloriesfiltered <- data3_clean_remove %>% filter(Calories!= 0)
data3_caloriesfiltered <- data3_caloriesfiltered %>%
distinct() %>%
drop_na()
# data_filtered
data3_duplicate <- sum(duplicated(data3_caloriesfiltered))
# duplicates
data3_duplicate
## [1] 0
head(data3_caloriesfiltered)
## Id ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
## 4 1503960366 4/12/2016 3:00:00 AM 47
## 5 1503960366 4/12/2016 4:00:00 AM 48
## 6 1503960366 4/12/2016 5:00:00 AM 48
daily_activity <- data_activityfiltered %>%
rename(date = ActivityDate) %>%
mutate(date = as_date(date,format = "%m/%d/%Y"))
head(daily_activity)
## Id date TotalSteps TotalDistance TrackerDistance
## 1 6775888955 2016-04-26 7091 5.27 5.27
## 2 6962181067 2016-04-21 11835 9.71 7.88
## 3 6962181067 2016-04-25 13239 9.27 9.08
## 4 6962181067 2016-05-09 12342 8.72 8.68
## 5 7007744171 2016-04-12 14172 10.29 9.48
## 6 7007744171 2016-04-13 12862 9.65 8.60
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 1.959596 3.48 0.87
## 2 4.081692 3.99 2.10
## 3 2.785175 3.02 1.68
## 4 3.167822 3.90 1.18
## 5 4.869783 4.50 0.38
## 6 4.851307 4.61 0.56
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 0.73 0.00 42
## 2 3.51 0.11 53
## 3 4.46 0.10 35
## 4 3.65 0.00 43
## 5 5.41 0.00 53
## 6 4.48 0.00 56
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 30 47 1321 2584
## 2 27 214 708 2179
## 3 31 282 637 2194
## 4 21 231 607 2105
## 5 8 355 1024 2937
## 6 22 261 1101 2742
daily_sleep <- data1_sleepfiltered %>%
rename(date = SleepDay) %>%
mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `date = as_date(date, format = "%m/%d/%Y %I:%M:%S %p", tz =
## Sys.timezone())`.
## Caused by warning:
## ! `tz` argument is ignored by `as_date()`
head(daily_sleep)
## Id date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 2016-04-12 1 327 346
## 2 1503960366 2016-04-13 2 384 407
## 3 1503960366 2016-04-15 1 412 442
## 4 1503960366 2016-04-16 2 340 367
## 5 1503960366 2016-04-17 1 700 712
## 6 1503960366 2016-04-19 1 304 320
hourly_intensities <- data2_intensitiesfiltered %>%
rename(date_time = ActivityHour) %>%
mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
hourly_intensities <- separate(hourly_intensities, date_time, into = c("date", "time"), sep = " ")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 318 rows [1, 20, 37, 53,
## 71, 105, 141, 160, 193, 212, 232, 254, 275, 309, 328, 347, 387, 409, 426, 448,
## ...].
head(hourly_intensities)
## Id date time TotalIntensity AverageIntensity
## 1 1503960366 2016-04-12 <NA> 20 0.333333
## 2 1503960366 2016-04-12 01:00:00 8 0.133333
## 3 1503960366 2016-04-12 02:00:00 7 0.116667
## 4 1503960366 2016-04-12 08:00:00 13 0.216667
## 5 1503960366 2016-04-12 09:00:00 30 0.500000
## 6 1503960366 2016-04-12 10:00:00 29 0.483333
hourly_calories <- data3_caloriesfiltered %>%
rename(date_time = ActivityHour) %>%
mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
hourly_calories <- separate(hourly_calories, date_time, into = c("date", "time"), sep = " ")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 934 rows [1, 25, 49, 73,
## 97, 121, 145, 169, 193, 217, 241, 265, 289, 313, 337, 361, 385, 409, 433, 457,
## ...].
head(hourly_calories)
## Id date time Calories
## 1 1503960366 2016-04-12 <NA> 81
## 2 1503960366 2016-04-12 01:00:00 61
## 3 1503960366 2016-04-12 02:00:00 59
## 4 1503960366 2016-04-12 03:00:00 47
## 5 1503960366 2016-04-12 04:00:00 48
## 6 1503960366 2016-04-12 05:00:00 48
n_distinct(daily_activity$id)
## [1] 0
n_distinct(daily_sleep$id)
## [1] 0
n_distinct(hourly_intensities$id)
## [1] 0
n_distinct(hourly_calories$id)
## [1] 0
nrow(daily_activity)
## [1] 32
nrow(daily_sleep)
## [1] 410
nrow(hourly_intensities)
## [1] 13002
nrow(hourly_calories)
## [1] 22099
daily_activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes)%>%
summary()
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 6064 Min. : 4.810 Min. : 607.0
## 1st Qu.: 9035 1st Qu.: 7.165 1st Qu.: 722.2
## Median :12634 Median : 9.690 Median : 812.0
## Mean :12042 Mean : 9.147 Mean : 870.6
## 3rd Qu.:14178 3rd Qu.:10.623 3rd Qu.:1028.2
## Max. :20067 Max. :14.300 Max. :1321.0
daily_sleep %>%
select(TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
hourly_intensities %>%
select(TotalIntensity,
AverageIntensity) %>%
summary()
## TotalIntensity AverageIntensity
## Min. : 1.00 Min. :0.01667
## 1st Qu.: 6.00 1st Qu.:0.10000
## Median : 13.00 Median :0.21667
## Mean : 20.46 Mean :0.34093
## 3rd Qu.: 25.00 3rd Qu.:0.41667
## Max. :180.00 Max. :3.00000
ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point()
ggplot(data=daily_sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point()
combined_data <- merge(daily_sleep, daily_activity, by="Id", all=TRUE)
n_distinct(combined_data$Id)
## [1] 24
head(combined_data)
## Id date.x TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 2016-04-12 1 327 346
## 2 1503960366 2016-04-13 2 384 407
## 3 1503960366 2016-04-15 1 412 442
## 4 1503960366 2016-04-16 2 340 367
## 5 1503960366 2016-04-17 1 700 712
## 6 1503960366 2016-04-19 1 304 320
## date.y TotalSteps TotalDistance TrackerDistance LoggedActivitiesDistance
## 1 <NA> NA NA NA NA
## 2 <NA> NA NA NA NA
## 3 <NA> NA NA NA NA
## 4 <NA> NA NA NA NA
## 5 <NA> NA NA NA NA
## 6 <NA> NA NA NA NA
## VeryActiveDistance ModeratelyActiveDistance LightActiveDistance
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## SedentaryActiveDistance VeryActiveMinutes FairlyActiveMinutes
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## LightlyActiveMinutes SedentaryMinutes Calories
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
ggplot(data=daily_sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
geom_ribbon(aes(ymin=0, ymax=TotalTimeInBed), alpha=0.2, fill="blue", color="blue") +
geom_point(color="red") +
labs(title="Total Time Asleep vs. Total Time in Bed")
6. Act- Recommendations
Bellabeat designs smart devices that help women track their daily habits and health. This case study analyzes user data to provide insights that can help improve sales and user experience.
However, the dataset is small and lacks demographic details, which may lead to biased results. To improve accuracy, I recommend collecting more diverse data to better understand the target audience and refine marketing strategies.
From my analysis, I identified key trends that can enhance the Bellabeat app’s features and improve marketing efforts to reach the right customers effectively.