In this capstone project I carried out exploratory data analysis on FitBit Fitness Tracker Data, in particular on datasets of users’ daily activities and sleep tracking data. Using these datasets I was able to find out which days users work out the most (and the least) which may be used as an indicator of their business or motivation level and help company such as Bellabeat choose timing for their promotions and special deals. I also found that participants consider running and jogging as the most effective exercise to burn calories, more importantly that they prefer to exercise outside rather than go to a gym. Finally I detected a strong correlation between the time users go to bed and the quality of sleep they tend to have, this detail can be potentially used to address sleep disorders and improve overall quality of sleep.
In order to plot bedtimes starting at 20:00 and ending at 08:00 I needed to offset the time and substitute the date with the same value for all rows, I am sure there is a better way to do this, but I could not find any, and had to figure it out myself. Also upon my acquaintance with the datasets I noticed a significant difference in the number of participants monitoring their daily activities, heart rate and sleep. I tried to find evidence that the last two features consume a lot of energy and users prefer not to use them in order to improve the battery life, however I was not able to find the difference in on the wrist and off the wrist (charging) times between groups that do and do not use these features to prove my point. The availability of such data could help reveal differences in the use of devices among different groups of people which in turn can help target and serve users better.
After examining the availible data, I decided to use 2 datasets for
my analysis dailyActivity_merged.csv
and
minuteSleep_merged.csv
The contents of them are
following:
The metadata file was kindly supplied by Laimis Andrijauskas
Load up the necessary libraries:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(skimr)
library(ggcorrplot)
Set up the working directory and load the data:
daily_activity_merged <- read.csv("datasets/dailyActivity_merged.csv")
minute_sleep_merged <- read.csv("datasets/minuteSleep_merged.csv")
Check the dataset for duplicates and look at its structure:
anyDuplicated(daily_activity_merged)
## [1] 0
str(daily_activity_merged)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
Id
are numericalActivityDate
are character
moreover the format MM/DD/YYYY is not standarddaily_activity_merged$Id <- as.factor(daily_activity_merged$Id)
daily_activity_merged <- daily_activity_merged %>%
mutate(Date = strptime(ActivityDate, "%m/%d/%Y")) %>%
select(!(ActivityDate))
daily_activity_merged$Date <- as.Date(daily_activity_merged$Date)
Now that the format is more appropriate I can start taking a closer look on the data at hand
Count the number of unique ids (participants):
length(unique(daily_activity_merged$Id))
## [1] 33
Add another column that has a count of days each user submitted data and get rid of records that count less than 7 days (1 week):
daily_activity_merged <- daily_activity_merged %>%
group_by(Id) %>%
mutate(Count = length(Id)) %>%
filter(Count > 7) %>%
select(Id,
Count,
Date,
everything()) %>%
ungroup()
Run the summary function to check if the values in columns make sense and aren’t out of range:
summary(daily_activity_merged)
## Id Count Date TotalSteps
## 1503960366: 31 Min. :18.00 Min. :2016-04-12 Min. : 0
## 1624580081: 31 1st Qu.:30.00 1st Qu.:2016-04-19 1st Qu.: 3790
## 1844505072: 31 Median :31.00 Median :2016-04-26 Median : 7441
## 1927972279: 31 Mean :29.68 Mean :2016-04-26 Mean : 7654
## 2022484408: 31 3rd Qu.:31.00 3rd Qu.:2016-05-04 3rd Qu.:10734
## 2026352035: 31 Max. :31.00 Max. :2016-05-12 Max. :36019
## (Other) :750
## TotalDistance TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.265 Median : 5.265 Median :0.0000 Median : 0.220
## Mean : 5.501 Mean : 5.487 Mean :0.1086 Mean : 1.509
## 3rd Qu.: 7.720 3rd Qu.: 7.713 3rd Qu.:0.0000 3rd Qu.: 2.090
## Max. :28.030 Max. :28.030 Max. :4.9421 Max. :21.920
##
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5697 Mean : 3.344 Mean :0.001613
## 3rd Qu.:0.8000 3rd Qu.: 4.790 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
##
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.0
## Median : 4.00 Median : 7.00 Median :199.0 Median :1057.0
## Mean : 21.25 Mean : 13.62 Mean :193.2 Mean : 990.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.2 3rd Qu.:1226.8
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
##
## Calories
## Min. : 0
## 1st Qu.:1830
## Median :2134
## Mean :2305
## 3rd Qu.:2794
## Max. :4900
##
LoggedActivitiesDistance
and
SedentaryActiveDistance
seem to have many zero values,
besides I don’t see how they can be used in the analysis so I might as
well get rid of them altogetherdaily_activity_merged <- daily_activity_merged %>%
filter(TotalSteps >0 &
TrackerDistance >0 &
TotalDistance >0) %>%
select(!(c(LoggedActivitiesDistance,
SedentaryActiveDistance)))
Use skim
function from skimr
to see how the
data is distributed also paying attention to columns “n_missing” and
“complete_rate”
skimr::skim(daily_activity_merged)
Name | daily_activity_merged |
Number of rows | 859 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
Date | 1 |
factor | 1 |
numeric | 12 |
________________________ | |
Group variables | None |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
Date | 0 | 1 | 2016-04-12 | 2016-05-12 | 2016-04-26 | 31 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Id | 0 | 1 | FALSE | 32 | 162: 31, 202: 31, 202: 31, 232: 31 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Count | 0 | 1 | 29.70 | 3.07 | 18.00 | 30.00 | 31.00 | 31.00 | 31.00 | ▁▁▁▁▇ |
TotalSteps | 0 | 1 | 8340.26 | 4743.45 | 8.00 | 4927.50 | 8059.00 | 11100.50 | 36019.00 | ▇▇▂▁▁ |
TotalDistance | 0 | 1 | 5.99 | 3.72 | 0.01 | 3.38 | 5.60 | 7.91 | 28.03 | ▇▇▁▁▁ |
TrackerDistance | 0 | 1 | 5.98 | 3.70 | 0.01 | 3.38 | 5.60 | 7.88 | 28.03 | ▇▇▁▁▁ |
VeryActiveDistance | 0 | 1 | 1.64 | 2.74 | 0.00 | 0.00 | 0.42 | 2.28 | 21.92 | ▇▁▁▁▁ |
ModeratelyActiveDistance | 0 | 1 | 0.62 | 0.91 | 0.00 | 0.00 | 0.31 | 0.87 | 6.48 | ▇▁▁▁▁ |
LightActiveDistance | 0 | 1 | 3.64 | 1.86 | 0.00 | 2.34 | 3.58 | 4.90 | 10.71 | ▅▇▆▁▁ |
VeryActiveMinutes | 0 | 1 | 23.12 | 33.69 | 0.00 | 0.00 | 7.00 | 35.50 | 210.00 | ▇▂▁▁▁ |
FairlyActiveMinutes | 0 | 1 | 14.84 | 20.45 | 0.00 | 0.00 | 8.00 | 21.00 | 143.00 | ▇▁▁▁▁ |
LightlyActiveMinutes | 0 | 1 | 210.51 | 96.62 | 0.00 | 147.00 | 209.00 | 272.00 | 518.00 | ▃▇▇▃▁ |
SedentaryMinutes | 0 | 1 | 954.54 | 280.01 | 0.00 | 721.00 | 1020.00 | 1188.50 | 1440.00 | ▁▁▇▆▆ |
Calories | 0 | 1 | 2363.60 | 702.91 | 52.00 | 1857.50 | 2220.00 | 2834.00 | 4900.00 | ▁▆▇▃▁ |
For further analysis the column that reflects number of the day of
the week is needed. It can be derived from the dates in the
Date
column by means of the lubridate
package:
daily_activity_merged$DayOfWeek = ifelse(wday(daily_activity_merged$Date)== 1,7, wday(daily_activity_merged$Date)-1)
Let’s take a look at the amount of time people spend on each
activity. In order to do that, the original “wide” data has to be turned
into a “long” one using gather
function:
DAM_mins_long <- daily_activity_merged %>%
select(Date,
DayOfWeek,
VeryActiveMinutes,
FairlyActiveMinutes,
LightlyActiveMinutes,
SedentaryMinutes ) %>%
gather(key = IntensityLevel, value = Minutes, c(3:6))
And now the barchart can be plotted:
(barplot_act <-ggplot(DAM_mins_long, aes(x = reorder(IntensityLevel, desc(Minutes)), y = Minutes, fill = IntensityLevel))+
geom_bar(stat = "identity")+
scale_x_discrete(labels = c("SedentaryMinutes"="Sedentary",
"LightlyActiveMinutes" = "Lightly active",
"VeryActiveMinutes" = "Very Active",
"FairlyActiveMinutes" = "Fairly Active"))+
scale_y_continuous(labels = label_comma())+
scale_fill_manual(values = c("#CDC0B0",
"#EEC900",
"#104E8B",
"#fe7f2d"))+
theme_minimal()+
theme(legend.position = "none")+
labs(title = "Most and least popular activities",
subtitle = "How much time users spend performing each activity",
x = ""))
As a next step let’s can create a correlation matrix to see if there is any strong relationships between values. But first non-numerical values must be removed:
corr_mat <- daily_activity_merged %>%
select(!c(Id, Date))
corr_mat <- round(cor(corr_mat),1)
(corr_plot <- ggcorrplot(cor(corr_mat),
hc.order = TRUE,
type = "lower",
lab = TRUE,
lab_size = 2.8,
colors = c("#104E8B",
"white",
"#fe7f2d"))+
theme_minimal()+
labs(x = "",
y = "")+
theme(legend.position = "none",
axis.text.x = element_text(angle = 45,
hjust = 1)))
LightlyActiveMinutes
and LightActiveDistance
is explained by the fact that these values are calculated based on one
another, this is also the case with FairlyActiveMinutes
and
ModeratelyActiveDistance
as well as with
VeryActiveMinutes
and
VeryActiveDistance.
Let’s look deeper into the relationship between steps and calories
(steps_cal <- ggplot(daily_activity_merged, aes(x = TotalSteps, y = Calories))+
geom_point()+
stat_smooth(formula = 'y ~ x',
method = "lm",
color = "#fe7f2d")+
theme_minimal()+
labs(title = "Steps vs. calories",
subtitle = "Correlation between number of steps taken and amount of calories burned",
x = "Steps"))
(steps_dist <-ggplot(daily_activity_merged, aes(x = TotalSteps, y =TotalDistance))+
geom_point()+
stat_smooth(formula = 'y ~ x',
method = "lm",
color = "#fe7f2d")+
theme_minimal()+
labs(title = "Steps vs. Distance (km)",
subtitle = "Steps counted by pedometer and distance covered according to GPS tracker",
x = "Steps",
y = "Distance (km)"))
It would be interesting to see how participants’ motivation changes through the course of the week. To show this let’s calculate the average amount of calories burnt per each day of the week in regards to overall average amount of calories:
cal_mean <- mean(daily_activity_merged$Calories)
cal_by_day <- daily_activity_merged %>%
group_by(DayOfWeek) %>%
summarize(CaloriesMean = mean(Calories),
RelativeVal = 100*(mean(Calories)-cal_mean)/cal_mean)
(cal_by_day_plot <- ggplot(cal_by_day, aes(x = as.factor(DayOfWeek), y = RelativeVal, fill = RelativeVal))+
geom_bar(stat = "identity")+
scale_x_discrete(labels = c("Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday"))+
theme_minimal()+
labs(title = "Relative calories burnt by day of week",
x = "",
y = "Relative calories (%)")+
theme(legend.position = "none"))
Now let’s take a look at another dataset that contains information about users’ sleeping routines to see if it needs cleaning, reformatting or structuring:
str(minute_sleep_merged)
## 'data.frame': 188521 obs. of 4 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ date : chr "4/12/2016 2:47:30 AM" "4/12/2016 2:48:30 AM" "4/12/2016 2:49:30 AM" "4/12/2016 2:50:30 AM" ...
## $ value: int 3 2 1 1 1 1 1 2 2 2 ...
## $ logId: num 1.14e+10 1.14e+10 1.14e+10 1.14e+10 1.14e+10 ...
Id
and logId
are
numerical and not factorialBefore starting to reshape the dataset, let’s remove duplicates if they’re present
length(minute_sleep_merged$Id)
## [1] 188521
minute_sleep_merged <- distinct(minute_sleep_merged)
length(minute_sleep_merged$Id)
## [1] 187978
For further analysis I need to bring the dataset into appropriate shape, which in this case means setting columns into proper format as well as filtering out records made between 8:00 and 20:00 as I am not interested in records of naps:
minute_sleep_merged <- minute_sleep_merged %>%
mutate(DateNew = strptime(date, "%m/%d/%Y %I:%M:%S %p")) %>%
tidyr::separate(DateNew, c("Date", "Time"), sep = " ", remove = FALSE) %>%
filter(Time >= "20:00:00" | Time <= "08:00:00") %>%
select(Id, LogId = logId, DateTime = DateNew, Value = value)
minute_sleep_merged$Id <- as.factor(minute_sleep_merged$Id)
minute_sleep_merged$LogId <- as.factor(minute_sleep_merged$LogId)
minute_sleep_merged$DateTime <- as.POSIXct(minute_sleep_merged$DateTime)
str(minute_sleep_merged)
## 'data.frame': 172148 obs. of 4 variables:
## $ Id : Factor w/ 23 levels "1503960366","1644430081",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LogId : Factor w/ 420 levels "11372227280",..: 14 14 14 14 14 14 14 14 14 14 ...
## $ DateTime: POSIXct, format: "2016-04-12 02:47:30" "2016-04-12 02:48:30" ...
## $ Value : int 3 2 1 1 1 1 1 2 2 2 ...
skim(minute_sleep_merged)
Name | minute_sleep_merged |
Number of rows | 172148 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
factor | 2 |
numeric | 1 |
POSIXct | 1 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Id | 0 | 1 | FALSE | 23 | 202: 15054, 837: 14836, 696: 14072, 555: 14026 |
LogId | 0 | 1 | FALSE | 420 | 115: 721, 114: 705, 115: 669, 115: 617 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Value | 0 | 1 | 1.09 | 0.32 | 1 | 1 | 1 | 1 | 3 | ▇▁▁▁▁ |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
DateTime | 0 | 1 | 2016-04-11 20:48:00 | 2016-05-12 08:00:00 | 2016-04-26 06:26:00 | 39062 |
In order to be able to plot the hours starting at 20:00 and ending at 8:00 I need to create a 4 hour offset so that 20:00 and 8:00 become 0:00 and 12:00 respectively. I will create a subset that contains shifted hours, Ids and LogIds:
MSM_subset <- minute_sleep_merged %>%
mutate(ShiftedTime = DateTime+hours(4)) %>%
tidyr::separate(ShiftedTime, c("Date", "SftTime"), sep = " " ) %>%
select(Id, LogId, SftTime)
Since I am interested in times when each sleep started, only the first record for each LogId has to be taken:
MSM_subset <- MSM_subset %>%
group_by(LogId) %>%
slice(1)
Now that I have starting times of each sleep I can put them together with the main dataset:
MSM_new <- right_join(minute_sleep_merged, MSM_subset, by = c("LogId", "Id"))
Now let’s calculate average sleep quality for each logged record, in
order to be able to manipulate values in column DateTime
its content has to be exactly that - date and time, since dates
themselves do not interest me, I can simply add today as a date
part:
MSM_new <- MSM_new %>%
select(Id, LogId, DateTime, SftTime, Value) %>%
group_by(Id, LogId, SftTime) %>%
summarise(MeanValue = mean(Value)) %>%
mutate(Date = today()) %>%
unite("DateTime", 5,3, sep = " ")
## `summarise()` has grouped output by 'Id', 'LogId'. You can override using the
## `.groups` argument.
MSM_new$DateTime <- as_datetime(MSM_new$DateTime)
tibble(MSM_new)
## # A tibble: 420 × 4
## Id LogId DateTime MeanValue
## <fct> <fct> <dttm> <dbl>
## 1 1503960366 11380564589 2023-08-01 06:47:30 1.08
## 2 1503960366 11388770715 2023-08-01 07:08:30 1.10
## 3 1503960366 11388770716 2023-08-01 00:10:00 1
## 4 1503960366 11402722600 2023-08-01 06:59:00 1.05
## 5 1503960366 11421831252 2023-08-01 06:11:00 1.08
## 6 1503960366 11421831253 2023-08-01 11:02:00 1.14
## 7 1503960366 11421831254 2023-08-01 03:27:00 1.01
## 8 1503960366 11439580762 2023-08-01 06:06:30 1.05
## 9 1503960366 11447640793 2023-08-01 06:01:00 1.06
## 10 1503960366 11455720858 2023-08-01 06:32:30 1.10
## # … with 410 more rows
Plot the set and see what it shows:
(sleep_quality_vs_time <- ggplot(MSM_new, aes(x =DateTime, y = MeanValue))+
geom_point()+
geom_smooth(method = 'loess',
formula = 'y ~ x',
color = "#fe7f2d")+
scale_x_datetime(date_breaks = "2 hours",
date_labels = c("21:00",
"21:00",
"23:00",
"01:00",
"03:00",
"05:00",
"07:00"))+
scale_y_continuous(limits = c(1, 1.6))+
theme_minimal()+
labs(title = "Bedtime vs. quality of sleep",
x = "",
y = "Sleep quality index"))
I started this project trying to find patterns and insights analysing the supplied data. The primary question that I tried to answer in this analysis was “What are the trends in use of fitness tracking devices ?”. I tried to apply my superficial knowledge of how these smart devices work and how they are used to find typical flaws users have to deal with, and how these flaws can be mitigated if not gotten rid of completely. Seeing quite a significant difference in use of functions such as steps tracking (33 participants), sleep monitoring (24 participants) and heart rate monitoring (14 participants), one of my hypotheses was that this difference exists because of high energy consumption of these functions and users did not enable them to extend battery life. Unfortunately I was not able to find any evidence to support that hypothesis. Such user behaviour is probably better explained by the fact that steps tracking function is enabled by default and latter two functions need to be enabled manually. Knowing that elder people usually have difficulties operating new products, in my opinion having access to users age data as well as how often the fitness tracker and user’s phone syncronize would definitely help shed light on this mystery. As for the trends, here are the key observations I made during this analysis:
According to the article Ku, PW., Steptoe, A., Liao, Y. et al. A cut-off of daily sedentary time and all-cause mortality in adults: a meta-regression analysis involving more than 1 million participants. BMC Med 16, 74 (2018). https://doi.org/10.1186/s12916-018-1062-2 there is “a log-linear dose-response association between daily sedentary time and all-cause mortality”. In order to lower the risk of all-cause mortality “it may be appropriate to encourage adults to engage in less sedentary behaviors, with fewer than 9 h a day”. One of the ways to do that might be some sort of a reminder on device’s screen to get up and perform a suggested exercise every 60 minutes.
This correlation tells us that users prefer to perform exercises that involve taking steps, such as walking, jogging or running rather than strength exercises such as lifting weights.
This tells us two things, one is that the precision of the calculations performed by the device is up to standard. Another one is that users who jog and run, do it outside as opposed to going to gym and perform those activities on a treadmill or an elliptical machine. It means that the fitness tracking device has to meet certain conditions when it comes to enclosure protection and device’s screen legibility under bright sun and while running.
Using this information fitness devices manufacturers can choose better timings for their advertising campaigns and promotions and count on a better response from the audience.
There is no evidence that bedtime causes the sleeping quality to improve or worsen, however the findings in the article Shahram Nikbakhtian, Angus B Reed, Bernard Dillon Obika, Davide Morelli, Adam C Cunningham, Mert Aral, David Plans, Accelerometer-derived sleep onset timing and cardiovascular disease incidence: a UK Biobank cohort study, European Heart Journal - Digital Health, Volume 2, Issue 4, December 2021, Pages 658–666, https://doi.org/10.1093/ehjdh/ztab088 “suggests the possibility of a relationship between sleep onset timing and risk of developing CVD, particularly for women”, and “sleep onset timings earlier than 10 pm and later than 11 pm were associated with increased risk of CVD”. So it is probably a good idea to notify users of suggested bedtime at around 10 pm providing them with the positive effects of such routine.