Introduction

The following report is an analysis for Bellabeat, a high-tech company that manufactures health-focused smart products. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. I have been asked to analyze smart device data to gain insight into how consumers are using their smart devices. These insights will then help guide marketing strategy for the company.

Data Overview

The data for this analysis comes from publicly available FitBit data. The data can be found here: https://www.kaggle.com/datasets/arashnic/fitbit

Preparing for Analysis

Install and load the tidyverse and other packages

library(tidyverse)
library(dplyr)
library(ggplot2)

Loading Data

Create a dataframe for daily activity data

daily_activity <- read.csv("dailyActivity_merged.csv")

Create a dataframe for sleep data

sleep_day <- read.csv("sleepDay_merged.csv")

Create a dataframe for weight data

weight_info <- read.csv("weightLogInfo_merged.csv")

Data Cleaning & Manipulation

Change date columns in datasets to match for join, using ID and Date as primary keys for joins

First to make sure all ‘Date’ column names are identical

daily_activity<- daily_activity %>% 
  rename(Date=ActivityDate)
sleep_day<- sleep_day %>% 
  rename(Date=SleepDay)

Next, to change the date format to be identical

sleep_day$Date <- as.Date(sleep_day$Date, format = "%m/%d/%Y")
daily_activity$Date <- as.Date(daily_activity$Date, format = "%m/%d/%Y")
weight_info$Date <- as.Date(weight_info$Date, format = "%m/%d/%Y")

Now these 3 datasets can be joined, using a left join to preserve all data from the largest set; daily activity

combined_data <- daily_activity %>% 
  left_join(sleep_day, by= c("Id", "Date")) %>% 
  left_join(weight_info, by=c("Id", "Date"))

Summary for Steps per User

step_summary <-daily_activity %>%
  group_by(Id) %>%
  summarise(AverageSteps = mean(TotalSteps, na.rm= TRUE))

Summary for Sleep per User

sleep_summary<-sleep_day %>% 
  group_by(Id) %>% 
  summarise(AverageSleep= mean(TotalMinutesAsleep, na.rm= TRUE))

Summary for BMI per User

BMI_summary <- weight_info %>% 
  group_by(Id) %>% 
  summarise(AverageBMI= mean(BMI, na.rm= TRUE))

Summary for Calories per User

Calorie_summary <- daily_activity %>% 
  group_by(Id) %>% 
  summarise (AverageCalories= mean(Calories, na.rm= TRUE))

Summary for Weight per User

Weight_summary <- weight_info %>% 
  group_by(Id) %>% 
  summarise(AverageWeight= mean(WeightPounds, na.rm= TRUE))

Create a new column in sleep data to find time spent in bed but not asleep

sleep_day <- sleep_day %>% 
  mutate(TimeNotAsleep= TotalTimeInBed-TotalMinutesAsleep)

Summary for Time Not Asleep

TimeNotAsleep_Summary<- sleep_day %>% 
  group_by(Id) %>% 
  summarise(AverageTimeNotAsleep = mean(TimeNotAsleep))

Create a new Dataframe for Summary Data

summary_data <- step_summary %>% 
      left_join(Calorie_summary, by = "Id") %>% 
      left_join(sleep_summary, by = "Id") %>% 
      left_join(BMI_summary, by = "Id") %>% 
      left_join(Weight_summary, by = "Id") %>% 
      left_join(TimeNotAsleep_Summary, by="Id")

Create a new column in weight data to designate as reported or not reported

summary_data$ReportedWeight <- ifelse(is.na(summary_data$AverageWeight), "Not Reported", "Reported")

Calculate as a percent

weight_counts <- summary_data %>%
  count(ReportedWeight) %>%
  mutate(Percentage = round(100 * n / sum(n), 1))

Now the data is prepared for analysis

Data Analysis

Visualization for the relationship between Steps and Calories

This graph shows a strong positive correlation between steps and calories. The more steps a user takes, the more calories they typically consume.

ggplot(data=combined_data, aes(x=TotalSteps, y=Calories)) + 
  geom_point()+
  geom_smooth(method = "lm", se = TRUE, color = "blue")+
  labs(title = "Relationship Between Steps and Calories")

Visualization for the relationship between Steps and Sleep

Here we see a negative correlation between steps and sleep. The more time a person spends sleeping, the less steps they take.

ggplot(data=combined_data, aes(x=TotalMinutesAsleep, y=TotalSteps)) + 
      geom_point()+
      geom_smooth(method = "lm", se = TRUE, color = "blue")+
      labs(title = "Relationship Between Steps and Sleep")

We know sleep is important to overall well being, but those who sleep more than 7 hours do not typically take the recommended 10k steps and those sleeping more than 10 hours typically take less than 7,500 steps.

Visualization for the relationship between Average Steps vs Average BMI

We see a negative correlation between steps and BMI. The more steps a person takes, the lower their BMI- despite more steps correlating to a higher calorie intake. This demonstrates the importance of step count in overall health.

ggplot(summary_data, aes(x=AverageSteps, y= AverageBMI))+ 
  geom_point()+ 
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  labs(title = "Average Steps vs Average BMI")

Insights

For Bellabeat, this means focusing their marketing strategies on promoting their product as a way to track steps and encourage more walking to achieve better overall health. From a product offering standpoint, Bellabeat should focus their marketing strategies on the Bellabeat app. Given the correlation between steps and calories, steps and sleep, and steps and BMI, the Bellabeat app can be an easy-to-use and centralized location for users to quickly access this important information.

What’s Next?

While Bellabeat’s suite of products offer a great foundation for tracking wellness data, the biggest area for improvement is in weight and BMI tracking.

Visualization for Percent of Users Reporting Weight

We see less than 25% of users have weight data reported. Given that weight and more specifically BMI is a strong indicator of overall wellness, this is a crucial aspect for Bellabeat users to keep track of.

ggplot(weight_counts, aes(x = ReportedWeight, y = Percentage, fill = ReportedWeight)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = paste0(Percentage, "%")), vjust = 1.0, size = 5) +
  labs(title = "Percentage of Users Reporting Weight Data",
       x = "Weight Data Reported",
       y = "Percentage") +
  theme_minimal() +
  scale_fill_manual(values = c("Reported" = "steelblue", "Not Reported" = "grey"))

Visualization for manual vs automatic weight reporting

This graph shows how users report their weight data, with True= Manual and False= Automatic. We see the majority of users (61.2%) manually report their weight data.

counts <- table(combined_data$IsManualReport)
counts <- na.omit(counts)
percent_labels <- round(100 * counts / sum(counts), 1)
labels <- paste0(names(counts), ": ", percent_labels, "%")
pie(counts,
    main = "Manual vs Automatic Reports",
    col = c("steelblue", "orange"),
    labels = labels)

Conclusion

Bellabeat should focus on enhancing the weight tracking experience for its users. Start by conducting an internal review of the current weight tracking systems and reviewing the layout for user-friendliness and general interactivity. This may also include conducting a survey to determine why more users are not reporting their weight data. Furthermore, Bellabeat should consider the addition of a new add-on product; a company specific scale that connects wirelessly to the Bellabeat app, allowing users to effortlessly report and track their weight.