About Bellabeat

Bellabeat is a high-tech company specializing in health-centered products designed for women. Despite being a successful small business, Bellabeat has strong potential to become a significant competitor in the global smart device market. Urška Sršen, co-founder and Chief Creative Officer, believes that insights derived from analyzing smart device usage data can reveal new growth avenues for the company.

Analysis Questions

Business Objective

This analysis aims to identify growth opportunities and provide recommendations for optimizing Bellabeat’s marketing strategy by examining trends in smart device usage.

Loading Packages

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)

Importing datasets and Reviewing

For this project, I will use FitBit Fitness Tracker Data.

activity <- read.csv("dailyActivity_merged.csv")         # Daily activity data
calories <- read.csv("hourlyCalories_merged.csv")        # Hourly calorie data
intensities <- read.csv("hourlyIntensities_merged.csv")  # Hourly intensity data
sleep <- read.csv("sleepDay_merged.csv")                 # Daily sleep data
weight <- read.csv("weightLogInfo_merged.csv")           # Weight log data

I’ve reviewed the data in Excel and now just need to verify that everything imported successfully by using the head() function.

head(activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
head(calories)
##           Id          ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM       81
## 2 1503960366  4/12/2016 1:00:00 AM       61
## 3 1503960366  4/12/2016 2:00:00 AM       59
## 4 1503960366  4/12/2016 3:00:00 AM       47
## 5 1503960366  4/12/2016 4:00:00 AM       48
## 6 1503960366  4/12/2016 5:00:00 AM       48
head(intensities)
##           Id          ActivityHour TotalIntensity AverageIntensity
## 1 1503960366 4/12/2016 12:00:00 AM             20         0.333333
## 2 1503960366  4/12/2016 1:00:00 AM              8         0.133333
## 3 1503960366  4/12/2016 2:00:00 AM              7         0.116667
## 4 1503960366  4/12/2016 3:00:00 AM              0         0.000000
## 5 1503960366  4/12/2016 4:00:00 AM              0         0.000000
## 6 1503960366  4/12/2016 5:00:00 AM              0         0.000000
head(sleep)
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320
head(weight)
##           Id                  Date WeightKg WeightPounds Fat   BMI
## 1 1503960366  5/2/2016 11:59:59 PM     52.6     115.9631  22 22.65
## 2 1503960366  5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
## 3 1927972279  4/13/2016 1:08:52 AM    133.5     294.3171  NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM     56.7     125.0021  NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM     57.3     126.3249  NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM     72.4     159.6147  25 27.45
##   IsManualReport        LogId
## 1           True 1.462234e+12
## 2           True 1.462320e+12
## 3          False 1.460510e+12
## 4           True 1.461283e+12
## 5           True 1.463098e+12
## 6           True 1.460938e+12

I spotted some problems with the timestamp data. So before analysis, I need to convert it to date time format and split to date and time.

Fixing formatting

# intensities
intensities$ActivityHour=as.POSIXct(intensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensities$time <- format(intensities$ActivityHour, format = "%H:%M:%S")
intensities$date <- format(intensities$ActivityHour, format = "%m/%d/%y")
# calories
calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")
# activity
activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
# sleep
sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")

With everything now prepared, I can begin exploring the datasets

Exploring and summarizing data

n_distinct(activity$Id)
## [1] 33
n_distinct(calories$Id)
## [1] 33
n_distinct(intensities$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(weight$Id)
## [1] 8

This information provides insight into the number of participants across each dataset.

The activity, calories, and intensities datasets each contain data from 33 participants, while the sleep dataset includes 24 participants, and the weight dataset only has data for 8 participants. The limited number in the weight dataset is insufficient to form reliable recommendations or conclusions.

Next, let’s review the summary statistics of the datasets:

Data Summary and Statistics

# activity
activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes, Calories) %>%
  summary()
##    TotalSteps    TotalDistance    SedentaryMinutes    Calories   
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0   Min.   :   0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8   1st Qu.:1828  
##  Median : 7406   Median : 5.245   Median :1057.5   Median :2134  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2   Mean   :2304  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5   3rd Qu.:2793  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0   Max.   :4900
# explore num of active minutes per category
activity %>%
  select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
  summary()
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0       
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0       
##  Median :  4.00    Median :  6.00      Median :199.0       
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8       
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0       
##  Max.   :210.00    Max.   :143.00      Max.   :518.0
# calories
calories %>%
  select(Calories) %>%
  summary()
##     Calories     
##  Min.   : 42.00  
##  1st Qu.: 63.00  
##  Median : 83.00  
##  Mean   : 97.39  
##  3rd Qu.:108.00  
##  Max.   :948.00
# sleep
sleep %>%
  select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
  summary()
##  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.000     Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1.000     1st Qu.:361.0      1st Qu.:403.0  
##  Median :1.000     Median :433.0      Median :463.0  
##  Mean   :1.119     Mean   :419.5      Mean   :458.6  
##  3rd Qu.:1.000     3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :3.000     Max.   :796.0      Max.   :961.0
# weight
weight %>%
  select(WeightKg, BMI) %>%
  summary()
##     WeightKg           BMI       
##  Min.   : 52.60   Min.   :21.45  
##  1st Qu.: 61.40   1st Qu.:23.96  
##  Median : 62.50   Median :24.39  
##  Mean   : 72.04   Mean   :25.19  
##  3rd Qu.: 85.05   3rd Qu.:25.56  
##  Max.   :133.50   Max.   :47.54

Key insights from the summary:

Merging DataSet

Before visualizing the data, I will merge the activity and sleep datasets. Using an inner join on the ‘Id’ and ‘date’ columns (created earlier by converting data to datetime format) will ensure all relevant records are accurately combined for analysis.

merged_data <- merge(sleep, activity, by=c('Id', 'date'))
head(merged_data)
##           Id     date   SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 04/12/16 2016-04-12                 1                327
## 2 1503960366 04/13/16 2016-04-13                 2                384
## 3 1503960366 04/15/16 2016-04-15                 1                412
## 4 1503960366 04/16/16 2016-04-16                 2                340
## 5 1503960366 04/17/16 2016-04-17                 1                700
## 6 1503960366 04/19/16 2016-04-19                 1                304
##   TotalTimeInBed ActivityDate TotalSteps TotalDistance TrackerDistance
## 1            346   2016-04-12      13162          8.50            8.50
## 2            407   2016-04-13      10735          6.97            6.97
## 3            442   2016-04-15       9762          6.28            6.28
## 4            367   2016-04-16      12669          8.16            8.16
## 5            712   2016-04-17       9705          6.48            6.48
## 6            320   2016-04-19      15506          9.88            9.88
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035

Visualizations

ggplot(data = activity, aes(x = TotalSteps, y = Calories)) + 
  geom_point(color = "#FF69B4", size = 3, alpha = 0.7) +     # Soft pink color for points
  geom_smooth(method = "lm", color = "#8A2BE2", se = TRUE,   # Purple line with confidence interval
              linetype = "dashed", size = 1) + 
  labs(
    title = "Total Steps vs. Calories Burned",
    x = "Total Steps",
    y = "Calories Burned",
    caption = "Source: FitBit Fitness Tracker Data, Extracted By Adenola Yusuf"
  ) +
  theme_minimal(base_size = 14) +                           # Clean, professional theme
  theme(
    plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),
    plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"),
    axis.title.x = element_text(face = "bold", color = "#4B0082", size = 12),
    axis.title.y = element_text(face = "bold", color = "#4B0082", size = 12),
    plot.caption = element_text(face = "italic", size = 10, color = "gray40"),
    panel.grid.major = element_line(color = "gray85", linetype = "dotted"),
    panel.grid.minor = element_blank(),
    axis.text = element_text(color = "gray30"),
    plot.background = element_rect(fill = "#FDF5E6", color = NA)  # Light cream background
  ) +
  annotate(
    "text", x = max(activity$TotalSteps) * 0.7, y = max(activity$Calories) * 0.9,
    label = "Positive correlation\nbetween steps and calories",
    color = "#8B0000", size = 4, fontface = "italic"
  ) + 
  annotate(
    "curve", x = max(activity$TotalSteps) * 0.75, y = max(activity$Calories) * 0.95,
    xend = max(activity$TotalSteps) * 0.9, yend = max(activity$Calories),
    color = "#8B0000", arrow = arrow(length = unit(0.02, "npc"))
  )

There is a positive correlation between Total Steps and Calories, as expected—greater activity levels are associated with higher calorie expenditure.

ggplot(data = sleep, aes(x = TotalMinutesAsleep, y = TotalTimeInBed)) + 
  geom_point(color = "#FFB6C1", size = 3, alpha = 0.7) +         # Light pink points
  geom_smooth(method = "lm", color = "#9400D3", se = TRUE,       # Dark purple regression line with confidence interval
              linetype = "dashed", size = 1) + 
  labs(
    title = "Total Minutes Asleep vs. Total Time in Bed",
    x = "Total Minutes Asleep",
    y = "Total Time in Bed",
    caption = "Source: FitBit Fitness Tracker Data."
  ) +
  theme_minimal(base_size = 14) +                                 # Clean, minimal theme
  theme(
    plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),
    plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"),
    axis.title.x = element_text(face = "bold", color = "#4B0082", size = 12),
    axis.title.y = element_text(face = "bold", color = "#4B0082", size = 12),
    plot.caption = element_text(face = "italic", size = 10, color = "gray40"),
    panel.grid.major = element_line(color = "gray85", linetype = "dotted"),
    panel.grid.minor = element_blank(),
    axis.text = element_text(color = "gray30"),
    plot.background = element_rect(fill = "#FFF5EE", color = NA)  # Light pinkish cream background
  ) +
  annotate(
    "text", x = max(sleep$TotalMinutesAsleep) * 0.6, y = max(sleep$TotalTimeInBed) * 0.9,
    label = "Direct correlation\nbetween sleep and time in bed",
    color = "#800080", size = 4, fontface = "italic"
  ) + 
  annotate(
    "curve", x = max(sleep$TotalMinutesAsleep) * 0.65, y = max(sleep$TotalTimeInBed) * 0.95,
    xend = max(sleep$TotalMinutesAsleep) * 0.8, yend = max(sleep$TotalTimeInBed) * 1.1,
    color = "#800080", arrow = arrow(length = unit(0.02, "npc"))
  )
## `geom_smooth()` using formula = 'y ~ x'

The relationship between Total Minutes Asleep and Total Time in Bed appears to be linear. Therefore, if Bellabeat users aim to enhance their sleep quality, implementing notifications to encourage bedtime could be beneficial.

Next, let us examine the intensity data over time, on an hourly basis

int_new <- intensities %>%
  group_by(time) %>%
  drop_na() %>%
  summarise(mean_total_int = mean(TotalIntensity))

ggplot(data = int_new, aes(x = time, y = mean_total_int)) + 
  geom_bar(stat = "identity", fill = "#1E90FF", color = "#4682B4", width = 0.7) +  # Soft blue bars with a darker outline
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 10),  # Rotate x-axis labels
        axis.title.x = element_text(face = "bold", size = 12, color = "#4B0082"),  # Bold x-axis title
        axis.title.y = element_text(face = "bold", size = 12, color = "#4B0082"),  # Bold y-axis title
        plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),     # Bold title
        plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"), # Subtitle
        plot.caption = element_text(face = "italic", size = 10, color = "gray40"),   # Caption
        panel.grid.major = element_line(color = "gray85", linetype = "dotted"),      # Major grid lines
        panel.grid.minor = element_blank(),                                         # Remove minor grid lines
        plot.background = element_rect(fill = "#FFF5EE", color = NA),              # Light pink background
        axis.text = element_text(color = "gray30")) +                               # Axis text color
  labs(
    title = "Average Total Intensity vs. Time",
    x = "Time of Day",
    y = "Average Total Intensity"
  )

Upon visualizing the Total Intensity on an hourly basis, I discovered that individuals tend to be more active between 5 AM and 10 PM. The peak activity occurs between 5 PM and 7 PM, which likely corresponds to users heading to the gym or taking a walk after work. This timeframe presents an opportunity for the Bellabeat app to remind and encourage users to engage in running or walking.

Next, let’s explore the relationship between Total Minutes Asleep and Sedentary Minutes.

ggplot(data = merged_data, aes(x = TotalMinutesAsleep, y = SedentaryMinutes)) + 
  geom_point(color = "#1E90FF", size = 3, alpha = 0.7) +              # Soft blue points
  geom_smooth(method = "lm", color = "#FF69B4", se = TRUE,            # Pink regression line
              linetype = "dashed", size = 1) + 
  labs(
    title = "Minutes Asleep vs. Sedentary Minutes",
    x = "Total Minutes Asleep",
    y = "Sedentary Minutes"
    ) +
  theme_minimal(base_size = 14) +                                    # Minimal theme for clarity
  theme(
    plot.title = element_text(face = "bold", size = 20, color = "#4B0082"),
    plot.subtitle = element_text(face = "italic", size = 15, color = "#A020F0"),
    axis.title.x = element_text(face = "bold", color = "#4B0082", size = 12),
    axis.title.y = element_text(face = "bold", color = "#4B0082", size = 12),
    plot.caption = element_text(face = "italic", size = 10, color = "gray40"),
    panel.grid.major = element_line(color = "gray85", linetype = "dotted"),
    panel.grid.minor = element_blank(),
    axis.text = element_text(color = "gray30"),
    plot.background = element_rect(fill = "#FFF5EE", color = NA)   # Light pink background
  ) +
  annotate(
    "text", x = max(merged_data$TotalMinutesAsleep) * 0.7, 
    y = max(merged_data$SedentaryMinutes) * 0.9,
    label = "Correlation observed\nbetween sleep and sedentary behavior",
    color = "#800080", size = 4, fontface = "italic"
  ) + 
  annotate(
    "curve", x = max(merged_data$TotalMinutesAsleep) * 0.75, 
    y = max(merged_data$SedentaryMinutes) * 0.95,
    xend = max(merged_data$TotalMinutesAsleep) * 0.9, 
    yend = max(merged_data$SedentaryMinutes),
    color = "#800080", arrow = arrow(length = unit(0.02, "npc"))
  )
## `geom_smooth()` using formula = 'y ~ x'

This analysis reveals a clear negative relationship between Sedentary Minutes and Sleep Duration.

To enhance sleep quality, the Bellabeat app could suggest that users reduce their sedentary time. However, it is important to emphasize that these insights should be substantiated with further data, as correlation does not imply causation.

Summary of Recommendations for the Business

The collection of data pertaining to activity, sleep, stress, and reproductive health has enabled Bellabeat to empower women with valuable insights into their health and habits. Since its establishment in 2013, Bellabeat has experienced rapid growth and has effectively positioned itself as a technology-driven wellness company for women.

Based on my analysis of the Fitbit Fitness Tracker data, I have identified several insights that could inform and enhance Bellabeat’s marketing strategy.

Insights on Women’s Health and Activity Patterns

The analysis of hourly intensity data indicates that women engaged in full-time employment tend to spend significant time at their computers or in meetings, resulting in increased sedentary behavior. Despite this, these women engage in light physical activities to maintain their health, as highlighted in the activity type analysis. However, there is a clear need for them to enhance their daily activity levels to reap greater health benefits. This demographic may benefit from education on developing sustainable healthy habits and motivation to foster ongoing engagement.

While the dataset does not specify participant gender, it is reasonable to infer that a diverse and balanced representation exists.

Key Messaging for Bellabeat’s Online Campaign

The Bellabeat app transcends conventional fitness tracking applications; it serves as a supportive companion that empowers women to harmonize their personal and professional lives while cultivating healthy habits. By providing educational resources and personalized recommendations, the app motivates users to integrate wellness into their daily routines, ultimately fostering a balanced and fulfilling lifestyle.

Strategic Ideas for the Bellabeat App

Conclusion

Thank you for your interest in my Bellabeat case study! This project marks my first experience using R, and I welcome any feedback or recommendations for improvement. Your insights will be invaluable as I continue to refine my analytical skills and enhance the app’s capabilities.