BellaBeat Logo
I am a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, that has the potential to become an even larger player in the global smart device market. Urška Sršen, co-founder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. I have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights that I discover will then help guide marketing strategy for the company. I will present my analysis to the Bellabeat executive team along with my recommendations for Bellabeat’s marketing strategy.
Bellabeat App: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products
Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace or clip. The Leaf Tracker connects to the Bellabeat app to track activity, sleep and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat Membership: Bellabeat also offers a subscription based membership program for users. Membership gives 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
I was asked to analyze smart device usage data in order to gain insight on how consumers use non-Bellabeat smart devices. I will then use these insights to apply to one of the one of the Bellabeat products.
1.1 Business Task The business task is to analyze the usage data of non-Bellabeat smart devices to gain insight into relevant consumer trends, as well as discovering how we can use that data to direct future Bellabeat marketing strategies. By applying these insights into the Bellabeat application and future products in order to maximize profits, growth for the company and to capitalize on the rapidly growing consumer base in the smart device and wellness space. The stakeholders, Urška Sršen and Sando Mur the co-founders, the Bellabeat executive team and the Bellabeat marketing analytics team, will all be using said data to make those final decisions.
Sršen encouraged me to use the following public data that explores smart device users’ daily habits: FitBit Fitness Tracker Data (Public Domain, dataset made available through Mobius).
This data set contains personal fitness trackers from thirty FitBit users. Thirty eligible FitBit users consented to the submission of personal tracker data including: minute-level output for physical activity, heart rate, and sleep monitoring. The data also includes information about daily activities, steps, and heart rate that can be used to explore users’ habits.
2.1 Notes about the Date Eighteen total data sets were provided in the FitBit Fitness Tracker Data link, they are individually stored in the form of .csv files. This analysis will focus on three data sets; the daily activity data set (‘daily_activity’), which contains merged data from other provided files like daily calories, daily intensities, and daily steps, the weight data set (‘weight’), and the daily sleep data set (‘sleep’). These files contain relevant data that are also tracked by Bellabeat products - this will provide me with the most relevant and useful insights to solve the business task at hand.
2.2 Issues with the Data Credibility
Using ROCCC to determined credibility and bias issues with the data set.
Reliable: The data contains thirty unique individuals out of a total of thirty plus million. This does meet the CLT or Central Limit Theorem so it is still valid. This equates to a 90-95% confidence level with a 15-18% margin of error, respectively, which is not ideal. A sample size of ten times this would provide a better insight. The data was also only collected over one month, a longer sample size would provide a more accurate and reliable information. NOT Reliable.
Original: This data set did not come within Bellabeat. The dataset was generated by a distributed survey via Amazon Mechanical Turk. NOT Original.
Comprehensive: More details about the thirty individuals chosen would help decide the bias, as well as information such as age and height amongst other things. Having these things would provide for a more comprehensive, helpful and accurate result. Bellabeat is a fitness company for women, so having a dataset that was an unbias set about women would be even better. NOT Comprehensive.
Current: The data set was obtained more than five years ago which isn’t necessarily representative of any current trends. NOT Current.
Cited: The data set was cited, but the validity of Amazon Mechanical Turk isn’t known. More research is needed to make it credible. NOT Cited.
Overall, the integrity and credibility are not where I would like them to be, to be confident in the dataset. However the general insights will still provide useful shortcomings we can avoid in the marketing of Bellabeat products.
2.3 Installing Packages
Below are the packages that I will and may need in this case study.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
2.4 Loading Packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(tidyr)
library(readr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
2.5 Importing Data Sets
daily_activity <- read.csv("dailyActivity_merged.csv")
weight <- read.csv("weightLogInfo_merged.csv")
sleep <- read.csv("sleepDay_merged.csv")
3. Process
Now that we prepared the data, I will begin the processing step. Here, I will be verifying the data, then cleaning and transforming the data for analysis.
3.1 Verifying Data
Now we will be verifying the datasets we’ve imported and check for errors.
head(daily_activity)
head(weight)
head(sleep)
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(weight)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
colnames(sleep)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
I noticed the consistency in the logging/tracking of the data is not consistent. Some people forgot to wear their FitBits, which recorded zero steps for certain days; this will skew any analysis, so I will remove the zeros from the data set. Some people did not participate in recording their sleep or weight. Some people did not participate for the whole duration of time. This will make a complete and in-depth analysis more difficult to conduct than originally thought.
daily_activity_new <- daily_activity %>%
filter(TotalSteps !=0)
Removing the rows with zero steps will definitely help with the analysis. There are still very low number of step data inputs present. There are also very low inputs for calories burnt. I will keep these in the data set for analysis, because perhaps those individuals did record that data for those days. The uncertainity of the data makes it less reliable than is ideal.
I also noticed that the sleep data set and weight data set both contain the date and time in one column. It is best to separate these into two columns “Date” and “Time”, if I do decide to use the date as a way to analyze the data between the three files. However, whilst viewing the data sets, I noticed a large discrepancy in the number of unique IDs present, as well as inconsistencies in the daily logging/tracking of the individual’s weight and sleep.
weight_new <- weight %>%
separate(Date, c("Date", "Time"), " ")
## Warning: Expected 2 pieces. Additional pieces discarded in 67 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
sleep_new <- sleep %>%
separate(SleepDay, c("Date", "Time"), " ")
## Warning: Expected 2 pieces. Additional pieces discarded in 413 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
n_distinct(daily_activity_new$Id)
## [1] 33
n_distinct(weight_new$Id)
## [1] 8
n_distinct(sleep_new$Id)
## [1] 24
Not everyone involved in this survey provided tracking data for each data set. Only eight people entered their weight, and only two continued to log their daily metrics. Only twenty-four people entered their sleep data. There are thirty-three people recorded in the daily activity data, despite the data citation saying there are thirty people in the sample size. This calls into question how reliable this data actually is. This makes cross-analyzing data more suspicious due to the number of incomplete and inconsistent tracked data. I will mainly focus on the ‘daily_activity’ data set for a more focused analysis and include some general recommendations for improvement on data logging/tracking consistencies for recording ones data.
I also noticed that there could be some duplicated rows in some of the data sets. I will confirm this and delete the duplicated rows for cleaner data.
nrow(daily_activity_new)
## [1] 863
nrow(weight_new)
## [1] 67
nrow(sleep_new)
## [1] 413
nrow(unique(daily_activity_new))
## [1] 863
nrow(unique(weight_new))
## [1] 67
nrow(unique(sleep_new))
## [1] 410
I will now create a new daily activity set with only unique rows.
sleep_daily <-unique(sleep_new)
view(weight_new)
view(sleep_daily)
view(daily_activity_new)
I will now identify trends and relationships that I find while I analyze the data. Hopefully I can discover valuable insights from my analysis that can help answer the questions asked.
I realized that I needed to re-install the (skimr) package to look through and analyze the data. I will now take a look of the detailed summary of each set.
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(skimr)
skim_without_charts(daily_activity_new)
| Name | daily_activity_new |
| Number of rows | 863 |
| Number of columns | 15 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 14 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ActivityDate | 0 | 1 | 8 | 9 | 0 | 31 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1 | 4.857542e+09 | 2.418405e+09 | 1503960366 | 2.320127e+09 | 4.445115e+09 | 6.962181e+09 | 8.877689e+09 |
| TotalSteps | 0 | 1 | 8.319390e+03 | 4.744970e+03 | 4 | 4.923000e+03 | 8.053000e+03 | 1.109250e+04 | 3.601900e+04 |
| TotalDistance | 0 | 1 | 5.980000e+00 | 3.720000e+00 | 0 | 3.370000e+00 | 5.590000e+00 | 7.900000e+00 | 2.803000e+01 |
| TrackerDistance | 0 | 1 | 5.960000e+00 | 3.700000e+00 | 0 | 3.370000e+00 | 5.590000e+00 | 7.880000e+00 | 2.803000e+01 |
| LoggedActivitiesDistance | 0 | 1 | 1.200000e-01 | 6.500000e-01 | 0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 4.940000e+00 |
| VeryActiveDistance | 0 | 1 | 1.640000e+00 | 2.740000e+00 | 0 | 0.000000e+00 | 4.100000e-01 | 2.270000e+00 | 2.192000e+01 |
| ModeratelyActiveDistance | 0 | 1 | 6.200000e-01 | 9.100000e-01 | 0 | 0.000000e+00 | 3.100000e-01 | 8.700000e-01 | 6.480000e+00 |
| LightActiveDistance | 0 | 1 | 3.640000e+00 | 1.860000e+00 | 0 | 2.340000e+00 | 3.580000e+00 | 4.890000e+00 | 1.071000e+01 |
| SedentaryActiveDistance | 0 | 1 | 0.000000e+00 | 1.000000e-02 | 0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.100000e-01 |
| VeryActiveMinutes | 0 | 1 | 2.302000e+01 | 3.365000e+01 | 0 | 0.000000e+00 | 7.000000e+00 | 3.500000e+01 | 2.100000e+02 |
| FairlyActiveMinutes | 0 | 1 | 1.478000e+01 | 2.043000e+01 | 0 | 0.000000e+00 | 8.000000e+00 | 2.100000e+01 | 1.430000e+02 |
| LightlyActiveMinutes | 0 | 1 | 2.100200e+02 | 9.678000e+01 | 0 | 1.465000e+02 | 2.080000e+02 | 2.720000e+02 | 5.180000e+02 |
| SedentaryMinutes | 0 | 1 | 9.557500e+02 | 2.802900e+02 | 0 | 7.215000e+02 | 1.021000e+03 | 1.189000e+03 | 1.440000e+03 |
| Calories | 0 | 1 | 2.361300e+03 | 7.027100e+02 | 52 | 1.855500e+03 | 2.220000e+03 | 2.832000e+03 | 4.900000e+03 |
skim_without_charts(weight_new)
| Name | weight_new |
| Number of rows | 67 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 6 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Date | 0 | 1 | 8 | 9 | 0 | 31 | 0 |
| Time | 0 | 1 | 7 | 8 | 0 | 26 | 0 |
| IsManualReport | 0 | 1 | 4 | 5 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1.00 | 7.009282e+09 | 1.950322e+09 | 1.503960e+09 | 6.962181e+09 | 6.962181e+09 | 8.877689e+09 | 8.877689e+09 |
| WeightKg | 0 | 1.00 | 7.204000e+01 | 1.392000e+01 | 5.260000e+01 | 6.140000e+01 | 6.250000e+01 | 8.505000e+01 | 1.335000e+02 |
| WeightPounds | 0 | 1.00 | 1.588100e+02 | 3.070000e+01 | 1.159600e+02 | 1.353600e+02 | 1.377900e+02 | 1.875000e+02 | 2.943200e+02 |
| Fat | 65 | 0.03 | 2.350000e+01 | 2.120000e+00 | 2.200000e+01 | 2.275000e+01 | 2.350000e+01 | 2.425000e+01 | 2.500000e+01 |
| BMI | 0 | 1.00 | 2.519000e+01 | 3.070000e+00 | 2.145000e+01 | 2.396000e+01 | 2.439000e+01 | 2.556000e+01 | 4.754000e+01 |
| LogId | 0 | 1.00 | 1.461772e+12 | 7.829948e+08 | 1.460444e+12 | 1.461079e+12 | 1.461802e+12 | 1.462375e+12 | 1.463098e+12 |
skim_without_charts(sleep_daily)
| Name | sleep_daily |
| Number of rows | 410 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Date | 0 | 1 | 8 | 9 | 0 | 31 | 0 |
| Time | 0 | 1 | 8 | 8 | 0 | 1 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1 | 4.994963e+09 | 2.060863e+09 | 1503960366 | 3.977334e+09 | 4702921684.0 | 6962181067 | 8792009665 |
| TotalSleepRecords | 0 | 1 | 1.120000e+00 | 3.500000e-01 | 1 | 1.000000e+00 | 1.0 | 1 | 3 |
| TotalMinutesAsleep | 0 | 1 | 4.191700e+02 | 1.186400e+02 | 58 | 3.610000e+02 | 432.5 | 490 | 796 |
| TotalTimeInBed | 0 | 1 | 4.584800e+02 | 1.274600e+02 | 61 | 4.037500e+02 | 463.0 | 526 | 961 |
This provides a nice overview of all the necessary cleaning that was done and if there are any issues that standout when doing an analysis from skimming. It looks good, but I would like to condense each into the most reliable and relevant columns that I need for a more focused analysis.
daily_activity_final <- daily_activity_new %>%
select(Id, ActivityDate, TotalSteps, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories) %>%
rename(Date = ActivityDate)
weight_final <- weight_new %>%
select(Id, Date, BMI, WeightPounds, IsManualReport)
sleep_final <- sleep_daily %>%
select(Id, Date, TotalMinutesAsleep, TotalTimeInBed)
Next I want to take a look at a more specific summary of the values.
summary(daily_activity_final)
## Id Date TotalSteps VeryActiveMinutes
## Min. :1.504e+09 Length:863 Min. : 4 Min. : 0.00
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 4923 1st Qu.: 0.00
## Median :4.445e+09 Mode :character Median : 8053 Median : 7.00
## Mean :4.858e+09 Mean : 8319 Mean : 23.02
## 3rd Qu.:6.962e+09 3rd Qu.:11092 3rd Qu.: 35.00
## Max. :8.878e+09 Max. :36019 Max. :210.00
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 52
## 1st Qu.: 0.00 1st Qu.:146.5 1st Qu.: 721.5 1st Qu.:1856
## Median : 8.00 Median :208.0 Median :1021.0 Median :2220
## Mean : 14.78 Mean :210.0 Mean : 955.8 Mean :2361
## 3rd Qu.: 21.00 3rd Qu.:272.0 3rd Qu.:1189.0 3rd Qu.:2832
## Max. :143.00 Max. :518.0 Max. :1440.0 Max. :4900
summary(weight_final)
## Id Date BMI WeightPounds
## Min. :1.504e+09 Length:67 Min. :21.45 Min. :116.0
## 1st Qu.:6.962e+09 Class :character 1st Qu.:23.96 1st Qu.:135.4
## Median :6.962e+09 Mode :character Median :24.39 Median :137.8
## Mean :7.009e+09 Mean :25.19 Mean :158.8
## 3rd Qu.:8.878e+09 3rd Qu.:25.56 3rd Qu.:187.5
## Max. :8.878e+09 Max. :47.54 Max. :294.3
## IsManualReport
## Length:67
## Class :character
## Mode :character
##
##
##
summary(sleep_final)
## Id Date TotalMinutesAsleep TotalTimeInBed
## Min. :1.504e+09 Length:410 Min. : 58.0 Min. : 61.0
## 1st Qu.:3.977e+09 Class :character 1st Qu.:361.0 1st Qu.:403.8
## Median :4.703e+09 Mode :character Median :432.5 Median :463.0
## Mean :4.995e+09 Mean :419.2 Mean :458.5
## 3rd Qu.:6.962e+09 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :8.792e+09 Max. :796.0 Max. :961.0
3.1 Trends
The median Total Steps for a user is 8053.
The median minutes for Very Active is 23.02 minutes, Fairly Active is 14.78 minutes, Lightly Active is 210 minutes, and Sedentary is 955.8 minutes.
The median BMI is 25.19
The median minutes asleep is 419.2, and the median minutes in bed is 458.5.
Again there are outliers in the data that were not removed due to the lack of information. These were kept in, in case those extreme values were in fact legitimate. However, in the case that those values were not legitimate, the average values above will be skewed.
Trends I noticed were that users were not consistent in logging their data and certain individuals who were consistently logging their data were not losing weight or seeing results over the duration of the data collection.
3.2 Conclusions from the Trends
According to a joint research investigation by the National Cancer Institute (NCI), the National Institute on Aging (NIA), and the Centers for Disease Control and Prevention (CDC) (amongst other research studies), the ideal daily number of Total Steps one should achieve is 10,000. So, the average individual here is not reaching that minimum goal.
One reason for this is their activity level. The individuals spent on average 955.8 minutes a day being sedentary, that is on average 16 hours a day.
Since the average BMI is 25.19, this puts these individuals in the overweight category, according to the World Health Organisation (WHO).
It makes sense that people who have a higher BMI are wearing FitBits. They have taken the first step in their health journey. They are not being active enough to see change. The more active someone is, the more steps they will take and the more calories they will burn. By doing this they will then lower their BMI with work over a certain period of time.
Furthermore, the average person is getting just under the minimum recommended amount of sleep with 7 hours a person should get, according to the National Sleep Foundation (NSF). Luckily, the individuals are only spending a little over 30 minutes falling asleep.
3.3 Questions to Answer
5.1 Revisiting Business Task
The business task is to analyze smart device usage data of non-Bellabeat smart devices to gain insight into relevant consumer trends within the global smart device market. We will also try to discover how to use these trends to apply to Bellabeat customers and to influence future Bellabeat marketing strategies. This will be done by applying said insights to the Bellabeat App and to future products in order to maximize profits and growth for the company and to capitalize on Bellabeats’ rapidly growing consumer base in the smart device/tech-wellness space.
5.2 Trends Identified
On average, the median Total Steps per day for the participating individuals was 8053, which is almost 2000 steps below the minimum Total Steps per day, as suggested by the NCI, NIA and the CDC.
On average, 79% of total minutes per day were spent being sedentary by the participating individuals over the course of a month.
The individuals had an average BMI of 25.19, which puts them into the overweight category.
On average, these individuals slept slightly less than the suggested minimum of 7 hours of sleep.
These individuals were not consistent with logging/tracking their data each day over the course of the month, and some individuals didn’t log/track their sleep or weight (only twenty-four unique users input sleep and eight for weight - only two of the eight made up the majority of the inputs).
The individuals did not lose weight, improve their BMI, sleep quality or see any improvement in their activity levels.
5.3 Answering Questions & Recommendations
IF Bellabeat offered an incentive for daily tracking. For example an in-app competition against other user or friends, badges and certificates could also help with consistency as well.
Through the competition you could win t-shirts, koozies, etc. which provides more marketing. For the yearly competition we could give away another product that could lead to more data and healthy users.
Bellabeat could also offer additional points during the weekend to incentives logging info, when its traditionally not logged.
Bellabeat using a TDEE calculator (Total Daily Energy Expenditure) to input age, weight, height and other information to create accurate and uniformed results.
The calculator can help determined the caloric deficit the user needs to meet their goals. Ex. If you consume two thousand calories, you will need a caloric deficit of five hundred calories to lose one pound a week consistently.
The user could sign up for push notifications to provide assistance in reaching their caloric goals, and when they have been sedentary for too long.
The app could provide a list of activities and exercises to do outside of the gym since most users were lightly active and sedentary for a majority of the time.
Bellabeat could also provide nutritional and exercise coaching with a paid membership, like Pelaton without the bike or equipment.
Bellabeat should track sleeping habits automatically with the users consent.
Using the Leaf product and an in app notification could provide a better analysis for the users ideal sleep schedule.
The app could notify the user the ideal time to get off their phone before bed, and when they should be in bed based on their sleep schedule.
5.4 Future Works
If this data set were to be collected again, I would like to see the following parameters met in order to create a flawless and in-depth analysis of this type of data:
A larger sample size with more responsive users would raise the confidence level and lower the margin of error.
Having a longer data collection period of at least six months.
Collecting the data in house or with a reliable third party.
More information from each user including age and height.
Also ensure that the data collected with no bias.
A more relevant data set would also show more relevant results.