The business task is to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. Then select one Bellabeat product to apply these insights to in your presentation.
Public data from Kaggle.com was used to explore the daily habits smart device users. This Fitbit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius) data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
I chose to focus on the daily activity, steps and calories. The data was collected from 30+ users and over a period of 2 months. It should be noted that the results may be biased. Also, since the weather and/or season could affect bias and credibility.
daily_activity <- c("dailyActivity_merged.csv", "dailyActivity_merged2.csv")
fitbit_activity <- map(daily_activity, read_csv)
## Rows: 457 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
all_activity <- bind_rows(fitbit_activity)
## Rows: 1,397
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <chr> "3/25/2016", "3/26/2016", "3/27/2016", "3/28/…
## $ TotalSteps <dbl> 11004, 17609, 12736, 13231, 12041, 10970, 122…
## $ TotalDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ TrackerDistance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, 7.…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3.3…
## $ ModeratelyActiveDistance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0.8…
## $ LightActiveDistance <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3.6…
## $ SedentaryActiveDistance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.0…
## $ VeryActiveMinutes <dbl> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43, 3…
## $ FairlyActiveMinutes <dbl> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, 18…
## $ LightlyActiveMinutes <dbl> 205, 274, 268, 224, 243, 223, 239, 200, 244, …
## $ SedentaryMinutes <dbl> 804, 588, 605, 1080, 763, 1174, 820, 866, 636…
## $ Calories <dbl> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 186…
## Id ActivityDate TotalSteps TotalDistance
## Min. :1.504e+09 Length:1397 Min. : 0 Min. : 0.000
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 3146 1st Qu.: 2.170
## Median :4.445e+09 Mode :character Median : 6999 Median : 4.950
## Mean :4.781e+09 Mean : 7281 Mean : 5.219
## 3rd Qu.:6.962e+09 3rd Qu.:10544 3rd Qu.: 7.500
## Max. :8.878e+09 Max. :36019 Max. :28.030
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.160 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 4.950 Median :0.0000 Median : 0.100
## Mean : 5.192 Mean :0.1315 Mean : 1.397
## 3rd Qu.: 7.480 3rd Qu.:0.0000 3rd Qu.: 1.830
## Max. :28.030 Max. :6.7271 Max. :21.920
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.610 1st Qu.:0.000000
## Median :0.2000 Median : 3.240 Median :0.000000
## Mean :0.5385 Mean : 3.193 Mean :0.001704
## 3rd Qu.:0.7700 3rd Qu.: 4.690 3rd Qu.:0.000000
## Max. :6.4800 Max. :12.510 Max. :0.110000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:111.0 1st Qu.: 729.0
## Median : 2.00 Median : 6.0 Median :195.0 Median :1057.0
## Mean : 19.68 Mean : 13.4 Mean :185.4 Mean : 992.5
## 3rd Qu.: 30.00 3rd Qu.: 18.0 3rd Qu.:262.0 3rd Qu.:1244.0
## Max. :210.00 Max. :660.0 Max. :720.0 Max. :1440.0
## Calories
## Min. : 0
## 1st Qu.:1799
## Median :2114
## Mean :2266
## 3rd Qu.:2770
## Max. :4900
## # A tibble: 1,397 × 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 3/25/2016 11004 7.11 7.11
## 2 1503960366 3/26/2016 17609 11.6 11.6
## 3 1503960366 3/27/2016 12736 8.53 8.53
## 4 1503960366 3/28/2016 13231 8.93 8.93
## 5 1503960366 3/29/2016 12041 7.85 7.85
## 6 1503960366 3/30/2016 10970 7.16 7.16
## 7 1503960366 3/31/2016 12256 7.86 7.86
## 8 1503960366 4/1/2016 12262 7.87 7.87
## 9 1503960366 4/2/2016 11248 7.25 7.25
## 10 1503960366 4/3/2016 10016 6.37 6.37
## # ℹ 1,387 more rows
## # ℹ 10 more variables: LoggedActivitiesDistance <dbl>,
## # VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## # LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## # VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>
## [1] "id" "activity_date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories"
## Rows: 1,397
## Columns: 15
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ activity_date <date> 2016-03-25, 2016-03-26, 2016-03-27, 2016-0…
## $ total_steps <dbl> 11004, 17609, 12736, 13231, 12041, 10970, 1…
## $ total_distance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, …
## $ tracker_distance <dbl> 7.11, 11.55, 8.53, 8.93, 7.85, 7.16, 7.86, …
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 2.57, 6.92, 4.66, 3.19, 2.16, 2.36, 2.29, 3…
## $ moderately_active_distance <dbl> 0.46, 0.73, 0.16, 0.79, 1.09, 0.51, 0.49, 0…
## $ light_active_distance <dbl> 4.07, 3.91, 3.71, 4.95, 4.61, 4.29, 5.04, 3…
## $ sedentary_active_distance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ very_active_minutes <dbl> 33, 89, 56, 39, 28, 30, 33, 47, 40, 15, 43,…
## $ fairly_active_minutes <dbl> 12, 17, 5, 20, 28, 13, 12, 21, 11, 30, 18, …
## $ lightly_active_minutes <dbl> 205, 274, 268, 224, 243, 223, 239, 200, 244…
## $ sedentary_minutes <dbl> 804, 588, 605, 1080, 763, 1174, 820, 866, 6…
## $ calories <dbl> 1819, 2154, 1944, 1932, 1886, 1820, 1889, 1…
## # A tibble: 1 × 3
## avg_steps avg_calories avg_distance
## <dbl> <dbl> <dbl>
## 1 7281. 2266. 5.22
## `geom_smooth()` using formula = 'y ~ x'
# Identify trends by total steps over time
# Clean column names
cleaned_activity <- clean_names(all_activity)
# Inspect column names
colnames(cleaned_activity)
## [1] "id" "activity_date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories"
# Clean column names
cleaned_activity <- clean_names(all_activity)
# Inspect the format of activity_date
head(cleaned_activity$activity_date)
## [1] "3/25/2016" "3/26/2016" "3/27/2016" "3/28/2016" "3/29/2016" "3/30/2016"
# Convert activity_date to Date format (adjust format as needed)
cleaned_activity <- cleaned_activity %>%
mutate(activity_date = mdy(activity_date))
head(cleaned_activity)
# Summarize total steps by day
daily_steps <- cleaned_activity %>%
mutate(activity_date = as.Date(activity_date)) %>%
group_by(activity_date) %>%
summarize(total_steps = sum(total_steps, na.rm = TRUE))
## `geom_smooth()` using formula = 'y ~ x'
#pie chart
# Calculate average steps per day for each user
user_avg_steps <- cleaned_activity %>%
group_by(id) %>%
summarize(avg_steps_per_day = mean(total_steps, na.rm = TRUE))
# Categorize users based on average steps
user_categories <- user_avg_steps %>%
mutate(category = case_when(
avg_steps_per_day > 12000 ~ "Very Active",
avg_steps_per_day >= 7500 & avg_steps_per_day <= 12000 ~ "Moderately Active",
avg_steps_per_day >= 5000 & avg_steps_per_day < 7500 ~ "Fairly Active",
avg_steps_per_day >= 2500 & avg_steps_per_day < 5000 ~ "Lightly Active",
avg_steps_per_day < 2500 ~ "Sedentary",
TRUE ~ "Unknown"
))
# Summarize the count of users in each category
category_summary <- user_categories %>%
count(category) %>%
mutate(percentage = n / sum(n) * 100)
# Order factor levels for category
category_summary$category <- factor(category_summary$category,
levels = c("Very Active", "Moderately Active", "Fairly Active", "Lightly Active", "Sedentary"))
# Create pie chart with percentages
ggplot(category_summary, aes(x = "", y = percentage, fill = category)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y") +
geom_text(aes(label = paste0(round(percentage, 1), "%")),
position = position_stack(vjust = 0.5),
color = "black", size = 3) +
labs(title = "User Distribution by Activity Level", x = NULL, y = NULL) +
theme_void() +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_manual(values = c("Very Active" = "purple",
"Moderately Active" = "green",
"Fairly Active" = "orange",
"Lightly Active" = "yellow",
"Sedentary" = "pink"))
# Summarize data to get average steps and calories
daily_summary_calories_steps <- cleaned_activity %>%
group_by(activity_date) %>%
summarize(
avg_steps = mean(total_steps, na.rm = TRUE),
avg_calories = mean(calories, na.rm = TRUE)
)
# Plot calories vs. steps with a regression line
ggplot(data = daily_summary_calories_steps, aes(x = avg_steps, y = avg_calories)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red") +
labs(title = "Calories vs. Steps", x = "Average Steps", y = "Average Calories") +
scale_x_continuous(labels = function(x) format(x, scientific = FALSE)) +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
By aligning Bellabeat’s marketing strategy with these smart device usage trends, the company can better meet the needs of its customers, enhance user engagement, and drive growth through targeted and personalized marketing efforts.