Bellabeat is a high-tech company specializing in health-centered products designed for women. Despite being a successful small business, Bellabeat has strong potential to become a significant competitor in the global smart device market. Urška Sršen, co-founder and Chief Creative Officer, believes that insights derived from analyzing smart device usage data can reveal new growth avenues for the company.
This analysis aims to identify growth opportunities and provide recommendations for optimizing Bellabeat’s marketing strategy by examining trends in smart device usage.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
For this project, I will use FitBit Fitness Tracker Data.
activity <- read.csv("dailyActivity_merged.csv") # Daily activity data
calories <- read.csv("hourlyCalories_merged.csv") # Hourly calorie data
intensities <- read.csv("hourlyIntensities_merged.csv") # Hourly intensity data
sleep <- read.csv("sleepDay_merged.csv") # Daily sleep data
weight <- read.csv("weightLogInfo_merged.csv") # Weight log data
I’ve reviewed the data in Excel and now just need to verify that everything imported successfully by using the head() function.
head(activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(calories)
## Id ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
## 4 1503960366 4/12/2016 3:00:00 AM 47
## 5 1503960366 4/12/2016 4:00:00 AM 48
## 6 1503960366 4/12/2016 5:00:00 AM 48
head(intensities)
## Id ActivityHour TotalIntensity AverageIntensity
## 1 1503960366 4/12/2016 12:00:00 AM 20 0.333333
## 2 1503960366 4/12/2016 1:00:00 AM 8 0.133333
## 3 1503960366 4/12/2016 2:00:00 AM 7 0.116667
## 4 1503960366 4/12/2016 3:00:00 AM 0 0.000000
## 5 1503960366 4/12/2016 4:00:00 AM 0 0.000000
## 6 1503960366 4/12/2016 5:00:00 AM 0 0.000000
head(sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
head(weight)
## Id Date WeightKg WeightPounds Fat BMI
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 3 1927972279 4/13/2016 1:08:52 AM 133.5 294.3171 NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125.0021 NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126.3249 NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 159.6147 25 27.45
## IsManualReport LogId
## 1 True 1.462234e+12
## 2 True 1.462320e+12
## 3 False 1.460510e+12
## 4 True 1.461283e+12
## 5 True 1.463098e+12
## 6 True 1.460938e+12
I spotted some problems with the timestamp data. So before analysis, I need to convert it to date time format and split to date and time.
# intensities
intensities$ActivityHour=as.POSIXct(intensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensities$time <- format(intensities$ActivityHour, format = "%H:%M:%S")
intensities$date <- format(intensities$ActivityHour, format = "%m/%d/%y")
# calories
calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")
# activity
activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
# sleep
sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")
With everything now prepared, I can begin exploring the datasets
n_distinct(activity$Id)
## [1] 33
n_distinct(calories$Id)
## [1] 33
n_distinct(intensities$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(weight$Id)
## [1] 8
This information provides insight into the number of participants across each dataset.
The activity, calories, and intensities datasets each contain data from 33 participants, while the sleep dataset includes 24 participants, and the weight dataset only has data for 8 participants. The limited number in the weight dataset is insufficient to form reliable recommendations or conclusions.
Next, let’s review the summary statistics of the datasets:
# activity
activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes, Calories) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes Calories
## Min. : 0 Min. : 0.000 Min. : 0.0 Min. : 0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8 1st Qu.:1828
## Median : 7406 Median : 5.245 Median :1057.5 Median :2134
## Mean : 7638 Mean : 5.490 Mean : 991.2 Mean :2304
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5 3rd Qu.:2793
## Max. :36019 Max. :28.030 Max. :1440.0 Max. :4900
# explore num of active minutes per category
activity %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
summary()
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0
## Median : 4.00 Median : 6.00 Median :199.0
## Mean : 21.16 Mean : 13.56 Mean :192.8
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0
## Max. :210.00 Max. :143.00 Max. :518.0
# calories
calories %>%
select(Calories) %>%
summary()
## Calories
## Min. : 42.00
## 1st Qu.: 63.00
## Median : 83.00
## Mean : 97.39
## 3rd Qu.:108.00
## Max. :948.00
# sleep
sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
## Median :1.000 Median :433.0 Median :463.0
## Mean :1.119 Mean :419.5 Mean :458.6
## 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.000 Max. :796.0 Max. :961.0
# weight
weight %>%
select(WeightKg, BMI) %>%
summary()
## WeightKg BMI
## Min. : 52.60 Min. :21.45
## 1st Qu.: 61.40 1st Qu.:23.96
## Median : 62.50 Median :24.39
## Mean : 72.04 Mean :25.19
## 3rd Qu.: 85.05 3rd Qu.:25.56
## Max. :133.50 Max. :47.54
Key insights from the summary:
Before visualizing the data, I will merge the activity and sleep datasets. Using an inner join on the ‘Id’ and ‘date’ columns (created earlier by converting data to datetime format) will ensure all relevant records are accurately combined for analysis.
merged_data <- merge(sleep, activity, by=c('Id', 'date'))
head(merged_data)
## Id date SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 04/12/16 2016-04-12 1 327
## 2 1503960366 04/13/16 2016-04-13 2 384
## 3 1503960366 04/15/16 2016-04-15 1 412
## 4 1503960366 04/16/16 2016-04-16 2 340
## 5 1503960366 04/17/16 2016-04-17 1 700
## 6 1503960366 04/19/16 2016-04-19 1 304
## TotalTimeInBed ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 346 2016-04-12 13162 8.50 8.50
## 2 407 2016-04-13 10735 6.97 6.97
## 3 442 2016-04-15 9762 6.28 6.28
## 4 367 2016-04-16 12669 8.16 8.16
## 5 712 2016-04-17 9705 6.48 6.48
## 6 320 2016-04-19 15506 9.88 9.88
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 34 209 726 1745
## 4 10 221 773 1863
## 5 20 164 539 1728
## 6 31 264 775 2035
ggplot(data = activity, aes(x = TotalSteps, y = Calories)) +
geom_point(color = "#FF69B4", size = 3, alpha = 0.7) + # Soft pink color for points
geom_smooth(method = "lm", color = "#8A2BE2", se = TRUE, # Purple line with confidence interval
linetype = "dashed", size = 1) +
labs(
title = "Total Steps vs. Calories Burned",
x = "Total Steps",
y = "Calories Burned",
caption = "Source: FitBit Fitness Tracker Data, Extracted By Adenola Yusuf"
) +
theme_minimal(base_size = 14) + # Clean, professional theme
theme(
plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),
plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"),
axis.title.x = element_text(face = "bold", color = "#4B0082", size = 12),
axis.title.y = element_text(face = "bold", color = "#4B0082", size = 12),
plot.caption = element_text(face = "italic", size = 10, color = "gray40"),
panel.grid.major = element_line(color = "gray85", linetype = "dotted"),
panel.grid.minor = element_blank(),
axis.text = element_text(color = "gray30"),
plot.background = element_rect(fill = "#FDF5E6", color = NA) # Light cream background
) +
annotate(
"text", x = max(activity$TotalSteps) * 0.7, y = max(activity$Calories) * 0.9,
label = "Positive correlation\nbetween steps and calories",
color = "#8B0000", size = 4, fontface = "italic"
) +
annotate(
"curve", x = max(activity$TotalSteps) * 0.75, y = max(activity$Calories) * 0.95,
xend = max(activity$TotalSteps) * 0.9, yend = max(activity$Calories),
color = "#8B0000", arrow = arrow(length = unit(0.02, "npc"))
)
There is a positive correlation between Total Steps and Calories, as expected—greater activity levels are associated with higher calorie expenditure.
ggplot(data = sleep, aes(x = TotalMinutesAsleep, y = TotalTimeInBed)) +
geom_point(color = "#FFB6C1", size = 3, alpha = 0.7) + # Light pink points
geom_smooth(method = "lm", color = "#9400D3", se = TRUE, # Dark purple regression line with confidence interval
linetype = "dashed", size = 1) +
labs(
title = "Total Minutes Asleep vs. Total Time in Bed",
x = "Total Minutes Asleep",
y = "Total Time in Bed",
caption = "Source: FitBit Fitness Tracker Data."
) +
theme_minimal(base_size = 14) + # Clean, minimal theme
theme(
plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),
plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"),
axis.title.x = element_text(face = "bold", color = "#4B0082", size = 12),
axis.title.y = element_text(face = "bold", color = "#4B0082", size = 12),
plot.caption = element_text(face = "italic", size = 10, color = "gray40"),
panel.grid.major = element_line(color = "gray85", linetype = "dotted"),
panel.grid.minor = element_blank(),
axis.text = element_text(color = "gray30"),
plot.background = element_rect(fill = "#FFF5EE", color = NA) # Light pinkish cream background
) +
annotate(
"text", x = max(sleep$TotalMinutesAsleep) * 0.6, y = max(sleep$TotalTimeInBed) * 0.9,
label = "Direct correlation\nbetween sleep and time in bed",
color = "#800080", size = 4, fontface = "italic"
) +
annotate(
"curve", x = max(sleep$TotalMinutesAsleep) * 0.65, y = max(sleep$TotalTimeInBed) * 0.95,
xend = max(sleep$TotalMinutesAsleep) * 0.8, yend = max(sleep$TotalTimeInBed) * 1.1,
color = "#800080", arrow = arrow(length = unit(0.02, "npc"))
)
## `geom_smooth()` using formula = 'y ~ x'
The relationship between Total Minutes Asleep and Total Time in Bed appears to be linear. Therefore, if Bellabeat users aim to enhance their sleep quality, implementing notifications to encourage bedtime could be beneficial.
Next, let us examine the intensity data over time, on an hourly basis
int_new <- intensities %>%
group_by(time) %>%
drop_na() %>%
summarise(mean_total_int = mean(TotalIntensity))
ggplot(data = int_new, aes(x = time, y = mean_total_int)) +
geom_bar(stat = "identity", fill = "#1E90FF", color = "#4682B4", width = 0.7) + # Soft blue bars with a darker outline
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 10), # Rotate x-axis labels
axis.title.x = element_text(face = "bold", size = 12, color = "#4B0082"), # Bold x-axis title
axis.title.y = element_text(face = "bold", size = 12, color = "#4B0082"), # Bold y-axis title
plot.title = element_text(face = "bold", size = 20, color = "#4B0082"), # Bold title
plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"), # Subtitle
plot.caption = element_text(face = "italic", size = 10, color = "gray40"), # Caption
panel.grid.major = element_line(color = "gray85", linetype = "dotted"), # Major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
plot.background = element_rect(fill = "#FFF5EE", color = NA), # Light pink background
axis.text = element_text(color = "gray30")) + # Axis text color
labs(
title = "Average Total Intensity vs. Time",
x = "Time of Day",
y = "Average Total Intensity"
)
Upon visualizing the Total Intensity on an hourly basis, I discovered that individuals tend to be more active between 5 AM and 10 PM. The peak activity occurs between 5 PM and 7 PM, which likely corresponds to users heading to the gym or taking a walk after work. This timeframe presents an opportunity for the Bellabeat app to remind and encourage users to engage in running or walking.
Next, let’s explore the relationship between Total Minutes Asleep and Sedentary Minutes.
ggplot(data = merged_data, aes(x = TotalMinutesAsleep, y = SedentaryMinutes)) +
geom_point(color = "#1E90FF", size = 3, alpha = 0.7) + # Soft blue points
geom_smooth(method = "lm", color = "#FF69B4", se = TRUE, # Pink regression line
linetype = "dashed", size = 1) +
labs(
title = "Minutes Asleep vs. Sedentary Minutes",
x = "Total Minutes Asleep",
y = "Sedentary Minutes"
) +
theme_minimal(base_size = 14) + # Minimal theme for clarity
theme(
plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),
plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"),
axis.title.x = element_text(face = "bold", color = "#4B0082", size = 12),
axis.title.y = element_text(face = "bold", color = "#4B0082", size = 12),
plot.caption = element_text(face = "italic", size = 10, color = "gray40"),
panel.grid.major = element_line(color = "gray85", linetype = "dotted"),
panel.grid.minor = element_blank(),
axis.text = element_text(color = "gray30"),
plot.background = element_rect(fill = "#FFF5EE", color = NA) # Light pink background
) +
annotate(
"text", x = max(merged_data$TotalMinutesAsleep) * 0.7,
y = max(merged_data$SedentaryMinutes) * 0.9,
label = "Correlation observed\nbetween sleep and sedentary behavior",
color = "#800080", size = 4, fontface = "italic"
) +
annotate(
"curve", x = max(merged_data$TotalMinutesAsleep) * 0.75,
y = max(merged_data$SedentaryMinutes) * 0.95,
xend = max(merged_data$TotalMinutesAsleep) * 0.9,
yend = max(merged_data$SedentaryMinutes),
color = "#800080", arrow = arrow(length = unit(0.02, "npc"))
)
## `geom_smooth()` using formula = 'y ~ x'
This analysis reveals a clear negative relationship between Sedentary Minutes and Sleep Duration.
To enhance sleep quality, the Bellabeat app could suggest that users reduce their sedentary time. However, it is important to emphasize that these insights should be substantiated with further data, as correlation does not imply causation.
The collection of data pertaining to activity, sleep, stress, and reproductive health has enabled Bellabeat to empower women with valuable insights into their health and habits. Since its establishment in 2013, Bellabeat has experienced rapid growth and has effectively positioned itself as a technology-driven wellness company for women.
Based on my analysis of the Fitbit Fitness Tracker data, I have identified several insights that could inform and enhance Bellabeat’s marketing strategy.
The analysis of hourly intensity data indicates that women engaged in full-time employment tend to spend significant time at their computers or in meetings, resulting in increased sedentary behavior. Despite this, these women engage in light physical activities to maintain their health, as highlighted in the activity type analysis. However, there is a clear need for them to enhance their daily activity levels to reap greater health benefits. This demographic may benefit from education on developing sustainable healthy habits and motivation to foster ongoing engagement.
While the dataset does not specify participant gender, it is reasonable to infer that a diverse and balanced representation exists.
The Bellabeat app transcends conventional fitness tracking applications; it serves as a supportive companion that empowers women to harmonize their personal and professional lives while cultivating healthy habits. By providing educational resources and personalized recommendations, the app motivates users to integrate wellness into their daily routines, ultimately fostering a balanced and fulfilling lifestyle.
The analysis reveals that the average total steps per day among users is 7,638, which is slightly below the recommended threshold for optimal health benefits. Research from the CDC indicates that reaching 8,000 steps daily is linked to a 51% lower risk of all-cause mortality, and achieving 12,000 steps can reduce this risk by 65% compared to a baseline of 4,000 steps. To promote healthier lifestyles, Bellabeat can implement features that encourage users to reach a minimum of 8,000 steps daily, accompanied by informative content detailing the associated health benefits.
For users aiming to manage their weight, the app could integrate functionality to track daily caloric intake. Providing personalized meal suggestions for low-calorie lunches and dinners can empower users to make healthier dietary choices, reinforcing their weight loss goals.
To enhance sleep quality, it would be beneficial for the app to incorporate reminders prompting users to establish a regular bedtime. This could be supplemented by educational resources on sleep hygiene and the importance of maintaining a consistent sleep schedule.
Observations indicate that the majority of physical activity occurs between 5 PM and 7 PM, likely due to users engaging in post-work workouts or walks. Capitalizing on this trend, Bellabeat can send motivational reminders to encourage users to take advantage of this window for physical activity, whether it’s a run, walk, or gym session.
Additionally, for users seeking to improve their sleep, the app can recommend strategies to reduce sedentary time, highlighting the importance of physical activity for overall well-being.
Thank you for your interest in my Bellabeat case study! This project marks my first experience using R, and I welcome any feedback or recommendations for improvement. Your insights will be invaluable as I continue to refine my analytical skills and enhance the app’s capabilities.