This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the plot. ##
1. Ask: Business Task
Bellabeat is a high-growth wellness technology company for women. The co-founder, Urška Sršen, believes that analyzing smart device usage data can unlock new marketing opportunities.
Business Task:
Analyze non-Bellabeat smart device usage data to identify trends in user
behavior, then apply these insights to one Bellabeat product (e.g., the
Bellabeat app or Leaf tracker) to guide marketing strategy.
Key Questions: 1. What are the trends in smart device usage? 2. How can these trends apply to Bellabeat customers? 3. How can they influence Bellabeat’s marketing strategy?
Stakeholders: - Urška Sršen, Co-founder & Chief Creative Officer - Sando Mur, Co-founder - Bellabeat Marketing Analytics Team ## 2. Prepare: Data Source
Dataset: FitBit Fitness Tracker Data (Kaggle, CC0
Public Domain)
Link: https://www.kaggle.com/datasets/arashnic/fitbit
Description: - 30 Fitbit users - Daily and minute-level data on activity, heart rate, and sleep - Period: April 12 – May 12, 2016
ROCCC Evaluation: - Reliability: Medium (small sample) - Originality: High (real user data) - Comprehensiveness: Medium (missing demographics) - Currentness: Low (2016 data) - Citedness: Low (not peer-reviewed)
Limitations: - Small, non-representative sample - Outdated (2016) - No age, gender, or location data ## 3. Process: Data Cleaning
# Load libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
# Load and clean data
daily <- read_csv("data/dailyActivity_merged.csv") %>% clean_names()
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleep <- read_csv("data/sleepDay_merged.csv") %>% clean_names()
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert dates
daily <- daily %>% mutate(date = mdy(activity_date))
sleep <- sleep %>% mutate(date = mdy_hms(sleep_day) %>% as_date())
# Join data
combined <- inner_join(daily, sleep, by = c("id", "date"))
# Create new variables
combined <- combined %>%
mutate(
activity_level = case_when(
total_steps < 5000 ~ "Low",
total_steps < 10000 ~ "Moderate",
TRUE ~ "High"
),
hours_asleep = total_minutes_asleep / 60
)
ggplot(combined, aes(x = activity_level)) +
geom_bar(fill = "#FF69B4") +
labs(title = "User Activity Levels", x = "Level", y = "Count") +
theme_minimal()
ggplot(combined, aes(x = total_steps, y = hours_asleep)) +
geom_point(alpha = 0.6, color = "#00BFFF") +
geom_smooth(method = 'lm') +
labs(title = "Steps vs. Sleep Duration", x = "Daily Steps", y = "Hours Asleep") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Based on the analysis, I recommend the following marketing strategies for Bellabeat App and Leaf Tracker:
Launch a “7-Day Sleep Challenge”
Encourage users to increase daily steps to improve sleep quality. The
app can send personalized notifications.
Create Weekly Step Challenges
Use social features to engage users. Reward achievements with digital
badges.
Personalize Content Based on Activity
If a user is inactive, suggest short walks. If active, offer mindfulness
content for recovery.
These strategies align with Bellabeat’s mission of holistic wellness and can increase user engagement and retention. ## Conclusion
The analysis reveals that most users do not reach 10,000 steps daily, and there is a positive trend between activity and sleep. By leveraging these insights, Bellabeat can enhance user engagement through personalized, data-driven marketing campaigns that promote both physical activity and better sleep.