Data Analytics Case Study

How Can A Wellness Technology Company Play It Smart

BellaBeat Logo

Scenario

I am a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, that has the potential to become an even larger player in the global smart device market. Urška Sršen, co-founder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. I have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights that I discover will then help guide marketing strategy for the company. I will present my analysis to the Bellabeat executive team along with my recommendations for Bellabeat’s marketing strategy.

Bellabeat Products

Bellabeat App: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products
Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace or clip. The Leaf Tracker connects to the Bellabeat app to track activity, sleep and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat Membership: Bellabeat also offers a subscription based membership program for users. Membership gives 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

1. Ask

I was asked to analyze smart device usage data in order to gain insight on how consumers use non-Bellabeat smart devices. I will then use these insights to apply to one of the one of the Bellabeat products.

1.1 Business Task The business task is to analyze the usage data of non-Bellabeat smart devices to gain insight into relevant consumer trends, as well as discovering how we can use that data to direct future Bellabeat marketing strategies. By applying these insights into the Bellabeat application and future products in order to maximize profits, growth for the company and to capitalize on the rapidly growing consumer base in the smart device and wellness space. The stakeholders, Urška Sršen and Sando Mur the co-founders, the Bellabeat executive team and the Bellabeat marketing analytics team, will all be using said data to make those final decisions.

2. Prepare

Sršen encouraged me to use the following public data that explores smart device users’ daily habits: FitBit Fitness Tracker Data (Public Domain, dataset made available through Mobius).

This data set contains personal fitness trackers from thirty FitBit users. Thirty eligible FitBit users consented to the submission of personal tracker data including: minute-level output for physical activity, heart rate, and sleep monitoring. The data also includes information about daily activities, steps, and heart rate that can be used to explore users’ habits.

2.1 Notes about the Date Eighteen total data sets were provided in the FitBit Fitness Tracker Data link, they are individually stored in the form of .csv files. This analysis will focus on three data sets; the daily activity data set (‘daily_activity’), which contains merged data from other provided files like daily calories, daily intensities, and daily steps, the weight data set (‘weight’), and the daily sleep data set (‘sleep’). These files contain relevant data that are also tracked by Bellabeat products - this will provide me with the most relevant and useful insights to solve the business task at hand.

2.2 Issues with the Data Credibility

Using ROCCC to determined credibility and bias issues with the data set.

Reliable: The data contains thirty unique individuals out of a total of thirty plus million. This does meet the CLT or Central Limit Theorem so it is still valid. This equates to a 90-95% confidence level with a 15-18% margin of error, respectively, which is not ideal. A sample size of ten times this would provide a better insight. The data was also only collected over one month, a longer sample size would provide a more accurate and reliable information. NOT Reliable.
Original: This data set did not come within Bellabeat. The dataset was generated by a distributed survey via Amazon Mechanical Turk. NOT Original.
Comprehensive: More details about the thirty individuals chosen would help decide the bias, as well as information such as age and height amongst other things. Having these things would provide for a more comprehensive, helpful and accurate result. Bellabeat is a fitness company for women, so having a dataset that was an unbias set about women would be even better. NOT Comprehensive.
Current: The data set was obtained more than five years ago which isn’t necessarily representative of any current trends. NOT Current.
Cited: The data set was cited, but the validity of Amazon Mechanical Turk isn’t known. More research is needed to make it credible. NOT Cited.

Overall, the integrity and credibility are not where I would like them to be, to be confident in the dataset. However the general insights will still provide useful shortcomings we can avoid in the marketing of Bellabeat products.

2.3 Installing Packages

Below are the packages that I will and may need in this case study.

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("dplyr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("janitor")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("tidyr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("readr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("lubridate")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("ggplot2")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

2.4 Loading Packages

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(dplyr)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(tidyr)
library(readr)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(ggplot2)

2.5 Importing Data Sets

daily_activity <- read.csv("dailyActivity_merged.csv")
weight <- read.csv("weightLogInfo_merged.csv")
sleep <- read.csv("sleepDay_merged.csv")

3. Process

Now that we prepared the data, I will begin the processing step. Here, I will be verifying the data, then cleaning and transforming the data for analysis.

3.1 Verifying Data

Now we will be verifying the datasets we’ve imported and check for errors.

head(daily_activity)

head(weight)

head(sleep)

colnames(daily_activity)

##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

colnames(weight)

## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "Fat"            "BMI"            "IsManualReport" "LogId"

colnames(sleep)

## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

I noticed the consistency in the logging/tracking of the data is not consistent. Some people forgot to wear their FitBits, which recorded zero steps for certain days; this will skew any analysis, so I will remove the zeros from the data set. Some people did not participate in recording their sleep or weight. Some people did not participate for the whole duration of time. This will make a complete and in-depth analysis more difficult to conduct than originally thought.

daily_activity_new <- daily_activity %>%
   filter(TotalSteps !=0)

Removing the rows with zero steps will definitely help with the analysis. There are still very low number of step data inputs present. There are also very low inputs for calories burnt. I will keep these in the data set for analysis, because perhaps those individuals did record that data for those days. The uncertainity of the data makes it less reliable than is ideal.

I also noticed that the sleep data set and weight data set both contain the date and time in one column. It is best to separate these into two columns “Date” and “Time”, if I do decide to use the date as a way to analyze the data between the three files. However, whilst viewing the data sets, I noticed a large discrepancy in the number of unique IDs present, as well as inconsistencies in the daily logging/tracking of the individual’s weight and sleep.

weight_new <- weight %>%
   separate(Date, c("Date", "Time"), " ")

## Warning: Expected 2 pieces. Additional pieces discarded in 67 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

sleep_new <- sleep %>%
   separate(SleepDay, c("Date", "Time"), " ")

## Warning: Expected 2 pieces. Additional pieces discarded in 413 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

n_distinct(daily_activity_new$Id)

## [1] 33

n_distinct(weight_new$Id)

## [1] 8

n_distinct(sleep_new$Id)

## [1] 24

Not everyone involved in this survey provided tracking data for each data set. Only eight people entered their weight, and only two continued to log their daily metrics. Only twenty-four people entered their sleep data. There are thirty-three people recorded in the daily activity data, despite the data citation saying there are thirty people in the sample size. This calls into question how reliable this data actually is. This makes cross-analyzing data more suspicious due to the number of incomplete and inconsistent tracked data. I will mainly focus on the ‘daily_activity’ data set for a more focused analysis and include some general recommendations for improvement on data logging/tracking consistencies for recording ones data.

I also noticed that there could be some duplicated rows in some of the data sets. I will confirm this and delete the duplicated rows for cleaner data.

nrow(daily_activity_new)

## [1] 863

nrow(weight_new)

## [1] 67

nrow(sleep_new)

## [1] 413

nrow(unique(daily_activity_new))

## [1] 863

nrow(unique(weight_new))

## [1] 67

nrow(unique(sleep_new))

## [1] 410

I will now create a new daily activity set with only unique rows.

sleep_daily <-unique(sleep_new)

view(weight_new)
view(sleep_daily)
view(daily_activity_new)

3. Analyze

I will now identify trends and relationships that I find while I analyze the data. Hopefully I can discover valuable insights from my analysis that can help answer the questions asked.

I realized that I needed to re-install the (skimr) package to look through and analyze the data. I will now take a look of the detailed summary of each set.

install.packages("skimr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

library(skimr)

skim_without_charts(daily_activity_new)

Data summary
Name	daily_activity_new
Number of rows	863
Number of columns	15
_______________________
Column type frequency:
character	1
numeric	14
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
ActivityDate	0	1	8	9	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	4.857542e+09	2.418405e+09	1503960366	2.320127e+09	4.445115e+09	6.962181e+09	8.877689e+09
TotalSteps	1	8.319390e+03	4.744970e+03	4	4.923000e+03	8.053000e+03	1.109250e+04	3.601900e+04
TotalDistance	1	5.980000e+00	3.720000e+00	0	3.370000e+00	5.590000e+00	7.900000e+00	2.803000e+01
TrackerDistance	1	5.960000e+00	3.700000e+00	0	3.370000e+00	5.590000e+00	7.880000e+00	2.803000e+01
LoggedActivitiesDistance	1	1.200000e-01	6.500000e-01	0	0.000000e+00	0.000000e+00	0.000000e+00	4.940000e+00
VeryActiveDistance	1	1.640000e+00	2.740000e+00	0	0.000000e+00	4.100000e-01	2.270000e+00	2.192000e+01
ModeratelyActiveDistance	1	6.200000e-01	9.100000e-01	0	0.000000e+00	3.100000e-01	8.700000e-01	6.480000e+00
LightActiveDistance	1	3.640000e+00	1.860000e+00	0	2.340000e+00	3.580000e+00	4.890000e+00	1.071000e+01
SedentaryActiveDistance	1	0.000000e+00	1.000000e-02	0	0.000000e+00	0.000000e+00	0.000000e+00	1.100000e-01
VeryActiveMinutes	1	2.302000e+01	3.365000e+01	0	0.000000e+00	7.000000e+00	3.500000e+01	2.100000e+02
FairlyActiveMinutes	1	1.478000e+01	2.043000e+01	0	0.000000e+00	8.000000e+00	2.100000e+01	1.430000e+02
LightlyActiveMinutes	1	2.100200e+02	9.678000e+01	0	1.465000e+02	2.080000e+02	2.720000e+02	5.180000e+02
SedentaryMinutes	1	9.557500e+02	2.802900e+02	0	7.215000e+02	1.021000e+03	1.189000e+03	1.440000e+03
Calories	1	2.361300e+03	7.027100e+02	52	1.855500e+03	2.220000e+03	2.832000e+03	4.900000e+03

skim_without_charts(weight_new)

Data summary
Name	weight_new
Number of rows	67
Number of columns	9
_______________________
Column type frequency:
character	3
numeric	6
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
Date	1	8	9	31
Time	1	7	8	26
IsManualReport	1	4	5	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	0	1.00	7.009282e+09	1.950322e+09	1.503960e+09	6.962181e+09	6.962181e+09	8.877689e+09	8.877689e+09
WeightKg	0	1.00	7.204000e+01	1.392000e+01	5.260000e+01	6.140000e+01	6.250000e+01	8.505000e+01	1.335000e+02
WeightPounds	0	1.00	1.588100e+02	3.070000e+01	1.159600e+02	1.353600e+02	1.377900e+02	1.875000e+02	2.943200e+02
Fat	65	0.03	2.350000e+01	2.120000e+00	2.200000e+01	2.275000e+01	2.350000e+01	2.425000e+01	2.500000e+01
BMI	0	1.00	2.519000e+01	3.070000e+00	2.145000e+01	2.396000e+01	2.439000e+01	2.556000e+01	4.754000e+01
LogId	0	1.00	1.461772e+12	7.829948e+08	1.460444e+12	1.461079e+12	1.461802e+12	1.462375e+12	1.463098e+12

skim_without_charts(sleep_daily)

Data summary
Name	sleep_daily
Number of rows	410
Number of columns	6
_______________________
Column type frequency:
character	2
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Date	0	1	8	9	0	31	0
Time	0	1	8	8	0	1	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	4.994963e+09	2.060863e+09	1503960366	3.977334e+09	4702921684.0	6962181067	8792009665
TotalSleepRecords	1	1.120000e+00	3.500000e-01	1	1.000000e+00	1.0	1	3
TotalMinutesAsleep	1	4.191700e+02	1.186400e+02	58	3.610000e+02	432.5	490	796
TotalTimeInBed	1	4.584800e+02	1.274600e+02	61	4.037500e+02	463.0	526	961

This provides a nice overview of all the necessary cleaning that was done and if there are any issues that standout when doing an analysis from skimming. It looks good, but I would like to condense each into the most reliable and relevant columns that I need for a more focused analysis.

daily_activity_final <- daily_activity_new %>%
   select(Id, ActivityDate, TotalSteps, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories) %>%
   rename(Date = ActivityDate)
weight_final <- weight_new %>%
    select(Id, Date, BMI, WeightPounds, IsManualReport)
sleep_final <- sleep_daily %>%
    select(Id, Date, TotalMinutesAsleep, TotalTimeInBed)

Next I want to take a look at a more specific summary of the values.

summary(daily_activity_final)

##        Id                Date             TotalSteps    VeryActiveMinutes
##  Min.   :1.504e+09   Length:863         Min.   :    4   Min.   :  0.00   
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 4923   1st Qu.:  0.00   
##  Median :4.445e+09   Mode  :character   Median : 8053   Median :  7.00   
##  Mean   :4.858e+09                      Mean   : 8319   Mean   : 23.02   
##  3rd Qu.:6.962e+09                      3rd Qu.:11092   3rd Qu.: 35.00   
##  Max.   :8.878e+09                      Max.   :36019   Max.   :210.00   
##  FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes    Calories   
##  Min.   :  0.00      Min.   :  0.0        Min.   :   0.0   Min.   :  52  
##  1st Qu.:  0.00      1st Qu.:146.5        1st Qu.: 721.5   1st Qu.:1856  
##  Median :  8.00      Median :208.0        Median :1021.0   Median :2220  
##  Mean   : 14.78      Mean   :210.0        Mean   : 955.8   Mean   :2361  
##  3rd Qu.: 21.00      3rd Qu.:272.0        3rd Qu.:1189.0   3rd Qu.:2832  
##  Max.   :143.00      Max.   :518.0        Max.   :1440.0   Max.   :4900

summary(weight_final)

##        Id                Date                BMI         WeightPounds  
##  Min.   :1.504e+09   Length:67          Min.   :21.45   Min.   :116.0  
##  1st Qu.:6.962e+09   Class :character   1st Qu.:23.96   1st Qu.:135.4  
##  Median :6.962e+09   Mode  :character   Median :24.39   Median :137.8  
##  Mean   :7.009e+09                      Mean   :25.19   Mean   :158.8  
##  3rd Qu.:8.878e+09                      3rd Qu.:25.56   3rd Qu.:187.5  
##  Max.   :8.878e+09                      Max.   :47.54   Max.   :294.3  
##  IsManualReport    
##  Length:67         
##  Class :character  
##  Mode  :character  
##                    
##                    
##

summary(sleep_final)

##        Id                Date           TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.504e+09   Length:410         Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:3.977e+09   Class :character   1st Qu.:361.0      1st Qu.:403.8  
##  Median :4.703e+09   Mode  :character   Median :432.5      Median :463.0  
##  Mean   :4.995e+09                      Mean   :419.2      Mean   :458.5  
##  3rd Qu.:6.962e+09                      3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :8.792e+09                      Max.   :796.0      Max.   :961.0

3.1 Trends

The median Total Steps for a user is 8053.
The median minutes for Very Active is 23.02 minutes, Fairly Active is 14.78 minutes, Lightly Active is 210 minutes, and Sedentary is 955.8 minutes.
The median BMI is 25.19
The median minutes asleep is 419.2, and the median minutes in bed is 458.5.
Again there are outliers in the data that were not removed due to the lack of information. These were kept in, in case those extreme values were in fact legitimate. However, in the case that those values were not legitimate, the average values above will be skewed.
Trends I noticed were that users were not consistent in logging their data and certain individuals who were consistently logging their data were not losing weight or seeing results over the duration of the data collection.

3.2 Conclusions from the Trends

According to a joint research investigation by the National Cancer Institute (NCI), the National Institute on Aging (NIA), and the Centers for Disease Control and Prevention (CDC) (amongst other research studies), the ideal daily number of Total Steps one should achieve is 10,000. So, the average individual here is not reaching that minimum goal.
One reason for this is their activity level. The individuals spent on average 955.8 minutes a day being sedentary, that is on average 16 hours a day.
Since the average BMI is 25.19, this puts these individuals in the overweight category, according to the World Health Organisation (WHO).
It makes sense that people who have a higher BMI are wearing FitBits. They have taken the first step in their health journey. They are not being active enough to see change. The more active someone is, the more steps they will take and the more calories they will burn. By doing this they will then lower their BMI with work over a certain period of time.
Furthermore, the average person is getting just under the minimum recommended amount of sleep with 7 hours a person should get, according to the National Sleep Foundation (NSF). Luckily, the individuals are only spending a little over 30 minutes falling asleep.

3.3 Questions to Answer

With all of these conclusions made, what can Bellabeat do to fix these issues?
How could Bellabeat be able to promote consistency with user logging/tracking of data?
How would Bellabeat help the average amount of users to achieve the minimum Total Steps a day?
How could Bellabeat help the users reach a healthy BMI whilst losing weight at a healthy and consistent rate?
How could Bellabeat help the users improve their sleep?

4. Share

Now let’s look at the findings in visualizations.

VeryActiveMin <- sum(daily_activity_final$VeryActiveMinutes)
FairlyActiveMin <- sum(daily_activity_final$FairlyActiveMinutes)
LightlyActiveMin <- sum(daily_activity_final$LightlyActiveMinutes)
SedentaryMin <- sum(daily_activity_final$SedentaryMinutes)
TotalMin <- VeryActiveMin + FairlyActiveMin + LightlyActiveMin + SedentaryMin

slices <- c(VeryActiveMin,FairlyActiveMin, LightlyActiveMin, SedentaryMin)
lbls <- c("VeryActive","FairlyActive","LightlyActive","Sedentary")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep="")
pie(slices, labels = lbls, col = rainbow(length(lbls)), main = "Percentage of Activity in Minutes")

With the Daily Activity Levels in minutes shown as percentages, it is visually clear that the individuals are not active enough, as 79% of the activity time for an entire month was spent sedentary for the average user. For a fitness tracking app, this is not good to see, especially when very active and fairly active make up only 2% and 1% of the total time, respectively.

ggplot(data=daily_activity_final) +
  geom_point(mapping=aes(x=TotalSteps, y=Calories), color="orange") +
  geom_smooth(mapping=aes(x=TotalSteps, y=Calories)) +
  labs(title="The Relationship Between Total Steps And Calories Burned", x="Total Steps", y="Calories Burned (kcal)")

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Although this relationship should be obvious, the more steps an individual takes, the more active they are, the more calories they are burned. The average person from this dataset is only reaching about 8,000 Total Steps a day, which equates to just under 2,500 calories burned.

Without more information regarding the person’s age, sex, and height, it would be impossible to say how many calories the person needs to burn to lose weight at a healthy rate. However, they are not burning enough calories to see a change in their weight or BMI, the individuals who logged those values did not see an improvement over a months worth of data collection.

combined_data <- merge(daily_activity_final, sleep_final, by="Id")
ggplot(data=combined_data) +
  geom_point(mapping=aes(x=TotalMinutesAsleep, y=VeryActiveMinutes, color="VeryActiveMinutes")) +
  geom_smooth(mapping=aes(x=TotalMinutesAsleep, y=VeryActiveMinutes, regLineColor="blue"))+
  labs(title="The Relationship Between Activity Levels and Total Minutes Asleep", x="Total Minutes Asleep", y="Minutes of Activity")

## Warning: Ignoring unknown aesthetics: regLineColor

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(data=combined_data) +
  geom_point(mapping=aes(x=TotalMinutesAsleep, y=FairlyActiveMinutes, color="FairlyActiveMinutes")) +
  geom_smooth(mapping=aes(x=TotalMinutesAsleep, y=FairlyActiveMinutes, regColorLine="blue")) +
  labs(title="The Relationship Between Activity Levels and Total Minutes Asleep", x="Total Minutes Asleep", y="Minutes of Activity")

## Warning: Ignoring unknown aesthetics: regColorLine

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(data=combined_data) +
  geom_point(mapping=aes(x=TotalMinutesAsleep, y=LightlyActiveMinutes, color="LightlyActiveMinutes")) +
  geom_smooth(mapping=aes(x=TotalMinutesAsleep, y=LightlyActiveMinutes)) +
  labs(title="The Relationship Between Activity Levels and Total Minutes Asleep", x="Total Minutes Asleep", y="Minutes of Activity")

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

ggplot(data=combined_data) +
  geom_point(mapping=aes(x=TotalMinutesAsleep, y=SedentaryMinutes, color="SedentaryMinutes")) +
  geom_smooth(mapping=aes(x=TotalMinutesAsleep, y=SedentaryMinutes)) +
  labs(title="The Relationship Between Activity Levels and Total Minutes Asleep", x="Total Minutes Asleep", y="Minutes of Activity")

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

I wanted to take a look at the relationship between Total Minutes Asleep and Activity Minutes to see if there would be a correlation in amount of minutes asleep and activity level. I would assume that the more sleep the more active they would be. Of course with this dataset that was not exactly true, regardless of the amount of sleep the average person was sedentary a majority of time. Perhaps the biggest twist was that the more sleep a person got the more they were sedentary.

5. Act

5.1 Revisiting Business Task

The business task is to analyze smart device usage data of non-Bellabeat smart devices to gain insight into relevant consumer trends within the global smart device market. We will also try to discover how to use these trends to apply to Bellabeat customers and to influence future Bellabeat marketing strategies. This will be done by applying said insights to the Bellabeat App and to future products in order to maximize profits and growth for the company and to capitalize on Bellabeats’ rapidly growing consumer base in the smart device/tech-wellness space.

5.2 Trends Identified

On average, the median Total Steps per day for the participating individuals was 8053, which is almost 2000 steps below the minimum Total Steps per day, as suggested by the NCI, NIA and the CDC.
On average, 79% of total minutes per day were spent being sedentary by the participating individuals over the course of a month.
The individuals had an average BMI of 25.19, which puts them into the overweight category.
On average, these individuals slept slightly less than the suggested minimum of 7 hours of sleep.
These individuals were not consistent with logging/tracking their data each day over the course of the month, and some individuals didn’t log/track their sleep or weight (only twenty-four unique users input sleep and eight for weight - only two of the eight made up the majority of the inputs).
The individuals did not lose weight, improve their BMI, sleep quality or see any improvement in their activity levels.

5.3 Answering Questions & Recommendations

IF Bellabeat offered an incentive for daily tracking. For example an in-app competition against other user or friends, badges and certificates could also help with consistency as well.
- Through the competition you could win t-shirts, koozies, etc. which provides more marketing. For the yearly competition we could give away another product that could lead to more data and healthy users.
- Bellabeat could also offer additional points during the weekend to incentives logging info, when its traditionally not logged.
Bellabeat using a TDEE calculator (Total Daily Energy Expenditure) to input age, weight, height and other information to create accurate and uniformed results.
- The calculator can help determined the caloric deficit the user needs to meet their goals. Ex. If you consume two thousand calories, you will need a caloric deficit of five hundred calories to lose one pound a week consistently.
- The user could sign up for push notifications to provide assistance in reaching their caloric goals, and when they have been sedentary for too long.
- The app could provide a list of activities and exercises to do outside of the gym since most users were lightly active and sedentary for a majority of the time.
- Bellabeat could also provide nutritional and exercise coaching with a paid membership, like Pelaton without the bike or equipment.
Bellabeat should track sleeping habits automatically with the users consent.
- Using the Leaf product and an in app notification could provide a better analysis for the users ideal sleep schedule.
- The app could notify the user the ideal time to get off their phone before bed, and when they should be in bed based on their sleep schedule.

5.4 Future Works

If this data set were to be collected again, I would like to see the following parameters met in order to create a flawless and in-depth analysis of this type of data:

A larger sample size with more responsive users would raise the confidence level and lower the margin of error.
Having a longer data collection period of at least six months.
Collecting the data in house or with a reliable third party.
More information from each user including age and height.
Also ensure that the data collected with no bias.
A more relevant data set would also show more relevant results.

Google Data Analytics Capstone

Andrew Curry

07/03/2022

Data Analytics Case Study

How Can A Wellness Technology Company Play It Smart

Scenario

Bellabeat Products

1. Ask

2. Prepare

3. Analyze

5. Act