Introduction

The business task is to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. Then select one Bellabeat product to apply these insights to in your presentation.

Public data from Kaggle.com was used to explore the daily habits smart device users. This Fitbit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius) data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

I chose to focus on the daily activity, steps and calories. The data was collected from 30+ users and over a period of 2 months. It should be noted that the results may be biased. Also, since the weather and/or season could affect bias and credibility.

Data Preparation

Load Datasets

daily_activity <- c("dailyActivity_merged.csv", "dailyActivity_merged2.csv")

fitbit_activity <- map(daily_activity, read_csv)
## Rows: 457 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
all_activity <- bind_rows(fitbit_activity)

Data Cleaning

## Rows: 1,397
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate             <chr> "3/25/2016", "3/26/2016", "3/27/2016", "3/28/…
## $ TotalSteps               <dbl> 11004, 17609, 12736, 13231, 12041, 10970, 122…
## $ TotalDistance            <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ TrackerDistance          <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.3…
## $ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.8…
## $ LightActiveDistance      <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.6…
## $ SedentaryActiveDistance  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.0…
## $ VeryActiveMinutes        <dbl> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 3…
## $ FairlyActiveMinutes      <dbl> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18…
## $ LightlyActiveMinutes     <dbl> 205, 274, 268, 224, 243, 223, 239, 200, 244, …
## $ SedentaryMinutes         <dbl> 804, 588, 605, 1080, 763, 1174, 820, 866, 636…
## $ Calories                 <dbl> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 186…
##        Id            ActivityDate         TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Length:1397        Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3146   1st Qu.: 2.170  
##  Median :4.445e+09   Mode  :character   Median : 6999   Median : 4.950  
##  Mean   :4.781e+09                      Mean   : 7281   Mean   : 5.219  
##  3rd Qu.:6.962e+09                      3rd Qu.:10544   3rd Qu.: 7.500  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.030  
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 2.160   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 4.950   Median :0.0000           Median : 0.100    
##  Mean   : 5.192   Mean   :0.1315           Mean   : 1.397    
##  3rd Qu.: 7.480   3rd Qu.:0.0000           3rd Qu.: 1.830    
##  Max.   :28.030   Max.   :6.7271           Max.   :21.920    
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.610      1st Qu.:0.000000       
##  Median :0.2000           Median : 3.240      Median :0.000000       
##  Mean   :0.5385           Mean   : 3.193      Mean   :0.001704       
##  3rd Qu.:0.7700           3rd Qu.: 4.690      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :12.510      Max.   :0.110000       
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.0       Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.0       1st Qu.:111.0        1st Qu.: 729.0  
##  Median :  2.00    Median :  6.0       Median :195.0        Median :1057.0  
##  Mean   : 19.68    Mean   : 13.4       Mean   :185.4        Mean   : 992.5  
##  3rd Qu.: 30.00    3rd Qu.: 18.0       3rd Qu.:262.0        3rd Qu.:1244.0  
##  Max.   :210.00    Max.   :660.0       Max.   :720.0        Max.   :1440.0  
##     Calories   
##  Min.   :   0  
##  1st Qu.:1799  
##  Median :2114  
##  Mean   :2266  
##  3rd Qu.:2770  
##  Max.   :4900
## # A tibble: 1,397 × 15
##            Id ActivityDate TotalSteps TotalDistance TrackerDistance
##         <dbl> <chr>             <dbl>         <dbl>           <dbl>
##  1 1503960366 3/25/2016         11004          7.11            7.11
##  2 1503960366 3/26/2016         17609         11.6            11.6 
##  3 1503960366 3/27/2016         12736          8.53            8.53
##  4 1503960366 3/28/2016         13231          8.93            8.93
##  5 1503960366 3/29/2016         12041          7.85            7.85
##  6 1503960366 3/30/2016         10970          7.16            7.16
##  7 1503960366 3/31/2016         12256          7.86            7.86
##  8 1503960366 4/1/2016          12262          7.87            7.87
##  9 1503960366 4/2/2016          11248          7.25            7.25
## 10 1503960366 4/3/2016          10016          6.37            6.37
## # ℹ 1,387 more rows
## # ℹ 10 more variables: LoggedActivitiesDistance <dbl>,
## #   VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## #   LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>
##  [1] "id"                         "activity_date"             
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "logged_activities_distance"
##  [7] "very_active_distance"       "moderately_active_distance"
##  [9] "light_active_distance"      "sedentary_active_distance" 
## [11] "very_active_minutes"        "fairly_active_minutes"     
## [13] "lightly_active_minutes"     "sedentary_minutes"         
## [15] "calories"
## Rows: 1,397
## Columns: 15
## $ id                         <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ activity_date              <date> 2016-03-25, 2016-03-26, 2016-03-27, 2016-0…
## $ total_steps                <dbl> 11004, 17609, 12736, 13231, 12041, 10970, 1…
## $ total_distance             <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, …
## $ tracker_distance           <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, …
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance       <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3…
## $ moderately_active_distance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0…
## $ light_active_distance      <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3…
## $ sedentary_active_distance  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ very_active_minutes        <dbl> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43,…
## $ fairly_active_minutes      <dbl> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, …
## $ lightly_active_minutes     <dbl> 205, 274, 268, 224, 243, 223, 239, 200, 244…
## $ sedentary_minutes          <dbl> 804, 588, 605, 1080, 763, 1174, 820, 866, 6…
## $ calories                   <dbl> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 1…

Data Analysis

Summary Statistics

## # A tibble: 1 × 3
##   avg_steps avg_calories avg_distance
##       <dbl>        <dbl>        <dbl>
## 1     7281.        2266.         5.22

Total Steps by User

## `geom_smooth()` using formula = 'y ~ x'

Total Steps Over Time

# Identify trends by total steps over time

# Clean column names
cleaned_activity <- clean_names(all_activity)

# Inspect column names
colnames(cleaned_activity)
##  [1] "id"                         "activity_date"             
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "logged_activities_distance"
##  [7] "very_active_distance"       "moderately_active_distance"
##  [9] "light_active_distance"      "sedentary_active_distance" 
## [11] "very_active_minutes"        "fairly_active_minutes"     
## [13] "lightly_active_minutes"     "sedentary_minutes"         
## [15] "calories"
# Clean column names
cleaned_activity <- clean_names(all_activity)

# Inspect the format of activity_date
head(cleaned_activity$activity_date)
## [1] "3/25/2016" "3/26/2016" "3/27/2016" "3/28/2016" "3/29/2016" "3/30/2016"
# Convert activity_date to Date format (adjust format as needed)
cleaned_activity <- cleaned_activity %>%
  mutate(activity_date = mdy(activity_date)) 
head(cleaned_activity)
# Summarize total steps by day
daily_steps <- cleaned_activity %>%
  mutate(activity_date = as.Date(activity_date)) %>%
  group_by(activity_date) %>%
  summarize(total_steps = sum(total_steps, na.rm = TRUE))
## `geom_smooth()` using formula = 'y ~ x'

User Distribution by Activity Level

#pie chart
# Calculate average steps per day for each user
user_avg_steps <- cleaned_activity %>%
  group_by(id) %>%
  summarize(avg_steps_per_day = mean(total_steps, na.rm = TRUE))

# Categorize users based on average steps
user_categories <- user_avg_steps %>%
  mutate(category = case_when(
    avg_steps_per_day > 12000 ~ "Very Active",
    avg_steps_per_day >= 7500 & avg_steps_per_day <= 12000 ~ "Moderately Active",
    avg_steps_per_day >= 5000 & avg_steps_per_day < 7500 ~ "Fairly Active",
    avg_steps_per_day >= 2500 & avg_steps_per_day < 5000 ~ "Lightly Active",
    avg_steps_per_day < 2500 ~ "Sedentary",
    TRUE ~ "Unknown"
  ))

# Summarize the count of users in each category
category_summary <- user_categories %>%
  count(category) %>%
  mutate(percentage = n / sum(n) * 100)

# Order factor levels for category
category_summary$category <- factor(category_summary$category, 
                                    levels = c("Very Active", "Moderately Active", "Fairly Active", "Lightly Active", "Sedentary"))
# Create pie chart with percentages
ggplot(category_summary, aes(x = "", y = percentage, fill = category)) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y") +
  geom_text(aes(label = paste0(round(percentage, 1), "%")), 
            position = position_stack(vjust = 0.5), 
            color = "black", size = 3) +
  labs(title = "User Distribution by Activity Level", x = NULL, y = NULL) +
  theme_void() +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_manual(values = c("Very Active" = "purple", 
                                "Moderately Active" = "green", 
                                "Fairly Active" = "orange", 
                                "Lightly Active" = "yellow", 
                                "Sedentary" = "pink"))

Calories vs. Steps

# Summarize data to get average steps and calories
daily_summary_calories_steps <- cleaned_activity %>%
  group_by(activity_date) %>%
  summarize(
    avg_steps = mean(total_steps, na.rm = TRUE),
    avg_calories = mean(calories, na.rm = TRUE)
  )
# Plot calories vs. steps with a regression line
ggplot(data = daily_summary_calories_steps, aes(x = avg_steps, y = avg_calories)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Calories vs. Steps", x = "Average Steps", y = "Average Calories") +
  scale_x_continuous(labels = function(x) format(x, scientific = FALSE)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Conclusion

Observations

  1. Total Steps by User:
    • No significant trend is observed in total steps by individual users.
  2. Total Steps Over Time:
    • An increase in total steps over time as the season gets warmer.
  3. User Distribution by Activity Level:
    • The distribution of users by activity level is relatively even, with similar proportions of users in the “Very/Moderately Active” categories and the “Fairly/Lightly Active” categories.
  4. Calories Burned vs. Steps:
    • There is a positive correlation between average steps and average calories burned, indicating that more active users tend to burn more calories.

Influence on Bellabeat Marketing Strategy

  1. Targeted Marketing Campaigns:
    • Use data on seasonal trends to create targeted marketing campaigns that resonate with users’ activity patterns. For example, promoting outdoor activity products in the spring and summer.
  2. Segmentation and Personalization:
    • Segment users based on their activity levels and personalize marketing messages to address their specific needs and motivations. Sedentary users might be encouraged with beginner-friendly content, while very active users could be targeted with advanced fitness products and features.
  3. Emphasize Holistic Health:
    • Highlight Bellabeat’s commitment to holistic health in marketing messages. Emphasize features that track various health metrics and promote the overall wellness benefits of using Bellabeat products.
  4. Leverage User Stories and Testimonials:
    • Share success stories and testimonials from users who have achieved their health and fitness goals with Bellabeat. This can inspire other users and build trust in the brand.
  5. Promote Social and Community Features:
    • Market the social and community aspects of Bellabeat products to appeal to users who value social connections and shared experiences. Highlight community challenges, achievements, and social sharing capabilities.

By aligning Bellabeat’s marketing strategy with these smart device usage trends, the company can better meet the needs of its customers, enhance user engagement, and drive growth through targeted and personalized marketing efforts.