About Bellabeat

Founded in 2014, Bellabeat is the company that developed one of the first wearable specifically designed for women and has since gone on to create a portfolio of digital products for tracking and improving the health of women.

Focusing on creating innovative health and wellness products for women, their mission is to empower women to take control of their health by providing them with technology-driven solutions that blend design and function.

Analysis Questions

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Business Task

Determine possible areas for expansion and suggestions for enhancing Bellabeat’s marketing approach based on usage patterns for smart devices.


Loading Necessary Packages for Analysis

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading Datasets

The dataset used for this analysis can be found here.

daily_activity <- read_csv("C:/R Code/Bellabeat/dailyActivity_merged.csv")
hourly_steps <- read_csv("C:/R Code/Bellabeat/hourlySteps_merged.csv")

Overview of Datasets

str(daily_activity)
## spc_tbl_ [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                      : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ TotalDistance           : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : num [1:940] 728 776 1218 726 773 ...
##  $ Calories                : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   TrackerDistance = col_double(),
##   ..   LoggedActivitiesDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   SedentaryActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
str(hourly_steps)
## spc_tbl_ [22,099 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id          : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr [1:22099] "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ StepTotal   : num [1:22099] 373 160 151 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityHour = col_character(),
##   ..   StepTotal = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Data Cleaning


Checking for NA values and Duplicates in Each Data Frame

any(is.na(daily_activity))
## [1] FALSE
any(is.na(hourly_steps))
## [1] FALSE
any(duplicated(daily_activity))
## [1] FALSE
any(duplicated(hourly_steps))
## [1] FALSE


No NA values or duplicates found, great!


I noticed that the date and time columns were incorrectly formatted as chr. This needs to be changed to the correct date and time format.

# Change activity date/hour columns to date/time format
daily_activity$ActivityDate <- as.Date(daily_activity$ActivityDate, format="%m/%d/%Y")
hourly_steps$ActivityHour <- as.POSIXct(hourly_steps$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
head(daily_activity)
## # A tibble: 6 × 15
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
##        <dbl> <date>            <dbl>         <dbl>           <dbl>
## 1 1503960366 2016-04-12        13162          8.5             8.5 
## 2 1503960366 2016-04-13        10735          6.97            6.97
## 3 1503960366 2016-04-14        10460          6.74            6.74
## 4 1503960366 2016-04-15         9762          6.28            6.28
## 5 1503960366 2016-04-16        12669          8.16            8.16
## 6 1503960366 2016-04-17         9705          6.48            6.48
## # ℹ 10 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>
head(hourly_steps)
## # A tibble: 6 × 3
##           Id ActivityHour        StepTotal
##        <dbl> <dttm>                  <dbl>
## 1 1503960366 2016-04-12 00:00:00       373
## 2 1503960366 2016-04-12 01:00:00       160
## 3 1503960366 2016-04-12 02:00:00       151
## 4 1503960366 2016-04-12 03:00:00         0
## 5 1503960366 2016-04-12 04:00:00         0
## 6 1503960366 2016-04-12 05:00:00         0

Analysis


Activity Levels and Device Engagement

According to this article on Medicine.net, guidelines as per the 10,000-step protocol is as follows:

Sedentary: Less than 5,000 steps daily
Low active: About 5,000 to 7,499 steps daily
Somewhat active: About 7,500 to 9,999 steps daily
Active: More than 10,000 steps daily
Highly active: More than 12,500 steps daily

I will be using this information to categorize users into different activity levels.

# Create a new data frame to categorize users based on their average daily steps
activity_levels <- daily_activity %>%
  group_by(Id) %>%
  summarise(
    AvgSteps = mean(TotalSteps, na.rm = TRUE)
  ) %>%
  mutate(
    ActivityLevel = case_when(
      AvgSteps < 5000 ~ "Sedentary",
      AvgSteps >= 5000 & AvgSteps < 7500 ~ "Low Active",
      AvgSteps >= 7500 & AvgSteps < 10000 ~ "Somewhat Active",
      AvgSteps >= 10000 & AvgSteps < 12500 ~ "Active",
      AvgSteps >= 12500 ~ "Highly Active"
    )
  )

# Calculate the percentage of users in each activity level
activity_distribution <- activity_levels %>%
  group_by(ActivityLevel) %>%
  summarise(Count = n()) %>%
  mutate(Percentage = floor((Count / sum(Count)) * 100)) %>%
  select(ActivityLevel, Percentage) %>%
  arrange(desc(Percentage))

# Display the activity distribution
activity_distribution
## # A tibble: 5 × 2
##   ActivityLevel   Percentage
##   <chr>                <dbl>
## 1 Low Active              27
## 2 Somewhat Active         27
## 3 Sedentary               24
## 4 Active                  15
## 5 Highly Active            6

As we can see, most of the users are fairly active, lightly active or sedentary. Much fewer are active or highly active.


Steps Taken vs Calorie Burned

# Create scatter plot of steps taken vs calorie burned
ggplot(data = daily_activity, aes(x = TotalSteps, y = Calories)) +  
  geom_point(size = 3, alpha = 0.6, color = "steelblue") +
  geom_smooth(method = "lm", se = FALSE, color = "darkblue") +  
  labs(title = "Relationship Between Daily Total Steps and Daily Calories Burned",
       x = "Total Steps Taken",
       y = "Calories Burned") + 
  theme_minimal() +  
  theme(plot.title = element_text(hjust = 0.5, size = 14),
        axis.title = element_text(size = 12),  
        legend.position = "none")  
## `geom_smooth()` using formula = 'y ~ x'

While this might seem obvious, it shows a clear relationship that more steps = more calories burned!


Steps per Hour

# Filter for hours between 7 am and 9 pm only
hourly_steps_filtered <- hourly_steps %>%
  mutate(Hour = lubridate::hour(as.POSIXct(ActivityHour, format = "%m/%d/%Y %I:%M:%S %p"))) %>%
  filter(Hour >= 7 & Hour <= 21) %>%  
  group_by(Hour) %>%
  summarise(
    AvgSteps = mean(StepTotal, na.rm = TRUE),
    MedianSteps = median(StepTotal, na.rm = TRUE)
  ) %>%
  arrange(AvgSteps)

# Create a line chart of hourly steps
ggplot(hourly_steps_filtered, aes(x = Hour, y = MedianSteps)) +
  geom_line(color = "blue", lwd = 1.15) +
  geom_point(color = "blue", size = 3.25) +
  labs(
    title = "Median Steps by Hour (7 AM to 9 PM)",
    x = "Hour of Day",
    y = "Median Steps"
  ) +
  scale_x_continuous(breaks = seq(7, 21, 1), labels = c("7 AM", "8 AM", "9 AM", "10 AM", "11 AM", "12 PM", 
                                                        "1 PM", "2 PM", "3 PM", "4 PM", "5 PM", 
                                                        "6 PM", "7 PM", "8 PM", "9 PM")) +
  theme_minimal()

We can see that the median steps per hour starts to decline after 1 PM and 6 PM.


Steps per Day of the Week

# Create new column to contain weekday name
daily_activity <- daily_activity %>%
  mutate(DayOfWeek = weekdays(as.Date(ActivityDate, format = "%m/%d/%Y")))

# Group dataframe by dayofweek and calculate median steps for each day
median_steps <- daily_activity %>%
  group_by(DayOfWeek) %>%
  summarise(MedianSteps = median(TotalSteps, na.rm = TRUE)) %>%
  arrange(match(DayOfWeek, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")))

# Custom order for ggplot to order from Sun-Sat
median_steps$DayOfWeek <- factor(median_steps$DayOfWeek,
                                 levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# Create a bar chart of median steps per day
ggplot(median_steps, aes(x = DayOfWeek, y = MedianSteps)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(x = "Day of the Week", y = "Median Steps", title = "Median Steps by Day of the Week") +
  theme_minimal()

We can see that Sunday is when users have the least amount of steps, while Tuesday has the most.


Conclusion and Recommendations

The analysis of smart device data reveals key trends in user activity, providing valuable insights into opportunities for Bellabeat to grow its customer base and enhance its marketing approach:


Targeting Different Activity Levels

The majority of users are either in the “Low Active” (27%) or “Somewhat Active” (27%) categories, with only a small percentage reaching the “Highly Active” level (6%). Bellabeat could leverage this insight by positioning its products as tools to help users gradually increase their activity levels, potentially through customizable reminders or goal-setting features that encourage incremental improvements. Marketing messaging can focus on how Bellabeat devices support users in building sustainable activity habits, with achievable targets for those in the sedentary to moderately active ranges.


Emphasizing Calorie-Burning Benefits of Increased Steps

The positive correlation between steps taken and calories burned presents a clear opportunity to emphasize how Bellabeat products can aid in managing or improving physical health by increasing daily steps. This could be highlighted in campaigns or user stories that focus on fitness and wellness, showing how Bellabeat’s devices support healthy, active lifestyles.


Optimizing Notification Timing for Activity Reminders

Given that user activity generally increases throughout the morning and peaks around 1 PM, Bellabeat could enhance user engagement by sending encouraging notifications in line with these natural activity patterns. For example, sending reminders around mid-morning could motivate users just as they are beginning to be more active, while late-afternoon reminders could sustain activity levels before they begin to wind down in the evening. This timing strategy aligns with users’ daily rhythms, making notifications feel timely and relevant.


Promoting Consistency in Activity Across the Week

With Sunday showing the lowest median steps, and Tuesday the highest, Bellabeat has an opportunity to encourage users to stay active on weekends. Marketing initiatives might include weekend challenges or social media campaigns that motivate users to maintain consistent activity, even on less active days. This can be reinforced by Bellabeat app features like activity streaks or badges for maintaining daily step goals across the entire week, helping users see weekends as an extension of their wellness routine.


Overall Marketing Strategy

Bellabeat can position its products as accessible, health-enhancing tools that adapt to a wide range of activity levels and lifestyles. By focusing on personalized guidance and promoting the health benefits of small, consistent activity increases, Bellabeat can attract a broader audience. Targeted advertising, informative content on the benefits of daily movement, and time-sensitive notifications could increase engagement, helping Bellabeat expand its market presence among women seeking to improve their wellness habits in sustainable ways.