This is an analysis project for a growing company called Bellabeat that focuses on female wellness technology products. Their products include:
Bellabeat app: An app that provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products. Leaf: Bellabeat’s classic wellness tracker which can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress. Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness. Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels. Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
To discover trends in smart device usage using dataset from a similar more established company, relate the trends to Bellabeat customers and make strategic product marketing decisions based on the observed usage trends.
The data has been sourced from the https://www.kaggle.com/datasets/arashnic/fitbit (CC0: Public Domain, dataset made available through Mobius). It is a Kaggle data set that contains personal fitness tracker data from thirty FitBit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
Based on the above information, the data is available for public use as a public domain dataset and can be said to be from a credible source, FitBit, making it reliable. It is also a cited dataset and so can be trusted.
However, it may not be totally ROCCC-compliant, where ROCCC stands for reliability, originality, comprehensiveness, current and cited, respectively. There is no information about the demographics of the participants such as age, sex, presence of any health conditions, etc,and it also has lots of data missing for some measured variables. The dataset is, therefore, incomplete and could contain some bias. In addition, the data is not very current since it is data from 2016 being used in 2024 and many factors relating to the variables measured or the methods of measurement used may have changed.
Therefore, judgements made from this dataset would have to be further verified using more current and comprehensive datasets.
install.packages('tidyverse')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages('lubridate')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages('dplyr')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages('ggplot2')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
install.packages('tidyr')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” ggplot2 3.5.1 âś” tibble 3.2.1
## âś” lubridate 1.9.3 âś” tidyr 1.3.1
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
daily_activity <- read.csv("/cloud/project/Bellabeat_Dataset/dailyActivity_merged1.csv")
daily_sleep <- read.csv("/cloud/project/Bellabeat_Dataset/sleepDay_merged.csv")
daily_calories <- read.csv("/cloud/project/Bellabeat_Dataset/dailyCalories_merged.csv")
daily_intensities <- read.csv("/cloud/project/Bellabeat_Dataset/dailyIntensities_merged.csv")
daily_steps <- read.csv("/cloud/project/Bellabeat_Dataset/dailySteps_merged.csv")
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(daily_sleep)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(daily_sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
head(daily_calories)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(daily_intensities)
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/14/2016 1218 181
## 4 1503960366 4/15/2016 726 209
## 5 1503960366 4/16/2016 773 221
## 6 1503960366 4/17/2016 539 164
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1 13 25 0
## 2 19 21 0
## 3 11 30 0
## 4 34 29 0
## 5 10 36 0
## 6 20 38 0
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 3.91 0.40 2.44
## 4 2.83 1.26 2.14
## 5 5.04 0.41 2.71
## 6 2.51 0.78 3.19
head(daily_steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
While exploring the data, I noticed that the date and time formats are not all the same across dataframes though there are similarities, each has an Id and a date column. Thus, I would be correcting the date/time formats across the dataframes before merging some of them on similar columns for further exploration.
daily_intensities $ ActivityDay = as.POSIXct(daily_intensities $ ActivityDay, format = "%m/%d/%Y %I:%M:%S %p", tz = Sys.timezone())
daily_intensities $ time <- format(daily_intensities $ ActivityDay, format = "%H:%M:%S")
daily_intensities $ date <- format(daily_intensities $ ActivityDay, format = "%m/%d/%y")
daily_calories $ ActivityDay = as.POSIXct(daily_calories $ ActivityDay, format="%m/%d/%Y %I:%M:%S %p", tz = Sys.timezone())
daily_calories $ time <- format(daily_calories $ ActivityDay, format = "%H:%M:%S")
daily_calories $ date <- format(daily_calories $ ActivityDay, format = "%m/%d/%y")
daily_activity $ ActivityDate = as.POSIXct(daily_activity $ ActivityDate, format ="%m/%d/%Y", tz=Sys.timezone())
daily_activity $ date <- format(daily_activity $ ActivityDate, format = "%m/%d/%y")
daily_sleep $ SleepDay = as.POSIXct(daily_sleep $ SleepDay, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
daily_sleep $ date <- format(daily_sleep $ SleepDay, format = "%m/%d/%y")
daily_steps $ ActivityDay = as.POSIXct(daily_steps $ ActivityDay, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
daily_steps $ date <- format(daily_steps $ ActivityDay, format = "%m/%d/%y")
n_distinct(daily_activity$Id)
## [1] 33
n_distinct(daily_intensities$Id)
## [1] 33
n_distinct(daily_calories$Id)
## [1] 33
n_distinct(daily_sleep$Id)
## [1] 24
n_distinct(daily_steps$Id)
## [1] 33
It was observed that all the dataframes have 33 distinct entries/participants except the daily_sleep dataframe which has 24 only. 24 participants are not really enough for one to make any statistically significant conclusions. Therefore, judgements made from the daily_sleep dataframe may not be very reliable statistically.
nrow(daily_activity)
## [1] 940
nrow(daily_intensities)
## [1] 940
nrow(daily_calories)
## [1] 940
nrow(daily_sleep)
## [1] 413
nrow(daily_steps)
## [1] 940
Again, each dataframe has 940 observations except the daily_sleep dataframe which has just 413 observations.
#### a. For the daily activity dataframe:
daily_activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
#### b. For the daily intensities data frame:
daily_intensities %>%
select(SedentaryMinutes,
LightlyActiveMinutes,
FairlyActiveMinutes,
VeryActiveMinutes) %>%
summary()
## SedentaryMinutes LightlyActiveMinutes FairlyActiveMinutes VeryActiveMinutes
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 729.8 1st Qu.:127.0 1st Qu.: 0.00 1st Qu.: 0.00
## Median :1057.5 Median :199.0 Median : 6.00 Median : 4.00
## Mean : 991.2 Mean :192.8 Mean : 13.56 Mean : 21.16
## 3rd Qu.:1229.5 3rd Qu.:264.0 3rd Qu.: 19.00 3rd Qu.: 32.00
## Max. :1440.0 Max. :518.0 Max. :143.00 Max. :210.00
#### c. For the daily calories dataframe:
daily_calories %>%
select(Calories) %>%
summary()
## Calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
#### d. For the daily steps dataframe:
daily_steps %>%
select(StepTotal) %>%
summary()
## StepTotal
## Min. : 0
## 1st Qu.: 3790
## Median : 7406
## Mean : 7638
## 3rd Qu.:10727
## Max. :36019
#### e. For the sleep dataframe:
daily_sleep %>%
select(TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
## Median :1.000 Median :433.0 Median :463.0
## Mean :1.119 Mean :419.5 Mean :458.6
## 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.000 Max. :796.0 Max. :961.0
Mean Total steps across participants was 7638 steps which is slightly below the acceptable range(from 8000 steps according to a University of Granada-led research in 2023) for maintaining a healthy cardiovascular system and reducing all-cause mortality. Many of the participants would have to increase the total steps taken and hence reduce sedentary time.
Mean sedentary time was a whooping 16.5 hours per day. According to a study by a team of researchers from the University of Mississippi and Pusan National University, less than 6 hours of daily sedentary time is required for healthy living and prevention of obesity and heart diseases.
On the average, participants had roughly 40 mins lag time between when they went to bed and when they actually slept. This could be improved. According to SleepFoundation.org, an average healthy individual takes about 15 to 20 mins after going to bed to fall asleep.
Thus, I would like to get further information about how the active or sedentary lifestyle of the participants relate to their sleep patterns and other factors such as calories.
This would likely help the company target the marketing to those who need to get better sleep quality based on reducing the amount of time spent in bed versus actual amount of time asleep. Notifications to increase movement throughout the day can help those who need to burn more calories to stay healthy or maintain a healthy weight.
I will be merging the daily_sleep and daily_activity dataframes to get better insights using visualizations since they are the two most unique dataframes. The daily activity dataframe contains most of what is in the others except the data in the daily sleep dataframe. I will be using an outer join to merge all the data together since the sleep dataframe is shorter than that of activity. This will help retain all of the data in the two dataframes.
merged <- merge(daily_sleep, daily_activity, by = c("Id","date"), all = TRUE)
head(merged)
## Id date SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 04/12/16 2016-04-12 1 327
## 2 1503960366 04/13/16 2016-04-13 2 384
## 3 1503960366 04/14/16 <NA> NA NA
## 4 1503960366 04/15/16 2016-04-15 1 412
## 5 1503960366 04/16/16 2016-04-16 2 340
## 6 1503960366 04/17/16 2016-04-17 1 700
## TotalTimeInBed ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 346 2016-04-12 13162 8.50 8.50
## 2 407 2016-04-13 10735 6.97 6.97
## 3 NA 2016-04-14 10460 6.74 6.74
## 4 442 2016-04-15 9762 6.28 6.28
## 5 367 2016-04-16 12669 8.16 8.16
## 6 712 2016-04-17 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
Just to confirm that everything went well, I would be checking how many participants are in the combined dataset.
n_distinct(merged$Id)
## [1] 33
Before I start plotting visualizations, I noticed a few redundant columns that I would have to remove to make our dataframe cleaner. So I remove them from our merged dataframe using:
final_merged <- merged %>% select(-c(TrackerDistance, LoggedActivitiesDistance, SleepDay, ActivityDate))
head(final_merged)
## Id date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 04/12/16 1 327 346
## 2 1503960366 04/13/16 2 384 407
## 3 1503960366 04/14/16 NA NA NA
## 4 1503960366 04/15/16 1 412 442
## 5 1503960366 04/16/16 2 340 367
## 6 1503960366 04/17/16 1 700 712
## TotalSteps TotalDistance VeryActiveDistance ModeratelyActiveDistance
## 1 13162 8.50 1.88 0.55
## 2 10735 6.97 1.57 0.69
## 3 10460 6.74 2.44 0.40
## 4 9762 6.28 2.14 1.26
## 5 12669 8.16 2.71 0.41
## 6 9705 6.48 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
I would like to get a sum of all the daily active minutes for each of the participants so that I could use it for some comparisons:
merged_final <- final_merged %>%
mutate(TotalActiveMinutes = rowSums(across(c(LightlyActiveMinutes,FairlyActiveMinutes,VeryActiveMinutes))))
head(merged_final)
## Id date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 04/12/16 1 327 346
## 2 1503960366 04/13/16 2 384 407
## 3 1503960366 04/14/16 NA NA NA
## 4 1503960366 04/15/16 1 412 442
## 5 1503960366 04/16/16 2 340 367
## 6 1503960366 04/17/16 1 700 712
## TotalSteps TotalDistance VeryActiveDistance ModeratelyActiveDistance
## 1 13162 8.50 1.88 0.55
## 2 10735 6.97 1.57 0.69
## 3 10460 6.74 2.44 0.40
## 4 9762 6.28 2.14 1.26
## 5 12669 8.16 2.71 0.41
## 6 9705 6.48 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## TotalActiveMinutes
## 1 366
## 2 257
## 3 222
## 4 272
## 5 267
## 6 222
I would like to know if the quality of sleep of the participants depended on the amount of time they spent active
ggplot(data = merged_final, aes(x = TotalActiveMinutes, y = TotalMinutesAsleep)) +
geom_point() +
geom_smooth() +
labs(title="Total Active Minutes vs. Total Minutes Asleep", caption = 'Data Source: FitBit Fitness Tracker Data')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values or values outside the scale range
## (`geom_point()`).
Insights: Being more active overall was more beneficial than being sedentary with respect to having a longer sleeping time. Although, being lightly active seemed to the most beneficial form of activity in this regard.
Even though any activity is better than no activity at all, it is more beneficial to perform moderate to high intensity physical activity at least up to 30 mins daily for optimum heart health. It has also been agreed by several researchers including those from John Hopkins Medicine that moderate-intensity physical activity may be beneficial for better sleep quality.
Hence, I move on to find out if these observations apply to the participants in this dataset and how that may be of advantage to making strategic marketing decisions on some Bellabeat products.
Is there any relationship between the more moderate to highly active hours and the actual length of time participants spend asleep?
I will begin by getting the sum of the more active minutes in a new column, thus:
merged_finally <- final_merged %>%
mutate(SumMoreActiveMinutes = rowSums(across(c(FairlyActiveMinutes, VeryActiveMinutes))))
head(merged_finally)
## Id date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 04/12/16 1 327 346
## 2 1503960366 04/13/16 2 384 407
## 3 1503960366 04/14/16 NA NA NA
## 4 1503960366 04/15/16 1 412 442
## 5 1503960366 04/16/16 2 340 367
## 6 1503960366 04/17/16 1 700 712
## TotalSteps TotalDistance VeryActiveDistance ModeratelyActiveDistance
## 1 13162 8.50 1.88 0.55
## 2 10735 6.97 1.57 0.69
## 3 10460 6.74 2.44 0.40
## 4 9762 6.28 2.14 1.26
## 5 12669 8.16 2.71 0.41
## 6 9705 6.48 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## SumMoreActiveMinutes
## 1 38
## 2 40
## 3 41
## 4 63
## 5 46
## 6 58
I will then be plotting the sum of more active minutes versus the total minutes asleep:
ggplot(data = merged_finally, aes(x = SumMoreActiveMinutes, y = TotalMinutesAsleep)) +
geom_point()+
geom_smooth() +
labs(title="Sum of More Active Time vs. Total Minutes Asleep",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values or values outside the scale range
## (`geom_point()`).
Insights: It seems that spending more time active per day was slightly beneficial to being able to spend more time asleep. However, those who did litle to no activity also slept almost as long. Though this may be biased given that the sleep dataset had lots of missing values.
Furthermore, I want to check if the sedentary time has influence on length of time asleep.
So I plot sedentary minutes with total minutes asleep.
ggplot(data=merged_finally, aes(x=SedentaryMinutes, y = TotalMinutesAsleep)) +
geom_point(color='navyblue') +
geom_smooth() +
labs(title="Sedentary Minutes vs Total Time Asleep",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values or values outside the scale range
## (`geom_point()`).
Insights: Sedentary lifestyle was found to have a negative effect on the length of time sleep with those with spent more time in a sedentary position having less than 7 hours of sleep. The more sedentary the less the length of time asleep.
Again, I want to know the relationship between the distance moved throughout the day and sleep length as well as its relationship with the amount of calories burnt per day. So I plot, first, total distance vs total sleep minutes.
ggplot(data=merged_finally, aes(y=TotalMinutesAsleep, x=TotalDistance)) +
geom_point(color='darkslategray') +
geom_smooth() +
labs(title="Minutes Asleep vs. Total Distance",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 530 rows containing missing values or values outside the scale range
## (`geom_point()`).
Insights: Distance covered in the course of physical activity didn’t seem to have any significant effect on the length of sleep time for participants.
ggplot(data=merged_finally, aes(y=Calories, x=TotalDistance)) +
geom_point(color='darkblue') +
geom_smooth() +
labs(title="Calories vs. Total Distance",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Insight: Unsurprisingly, covering more distance by the participants led to increased burning of calories.
Still exploring the data, I would like to know if the sleep quality/efficiency affects the physical activity of the participants. This is important because if found significant, the overall health of the participants who had poor sleep quality could be at risk. And such customers could be targeted with reminders/notifications to go to bed earlier to be able to get more sleep that would translate to better activity during the day and improve overall health outcomes.
I will start by calculating the sleep efficiency(or quality) which is a percentage of total time in bed spent actually sleeping.
merged_finally = merged_finally %>%
mutate(SleepEfficiency = ((TotalMinutesAsleep/TotalTimeInBed)*100))
head(merged_finally)
## Id date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 04/12/16 1 327 346
## 2 1503960366 04/13/16 2 384 407
## 3 1503960366 04/14/16 NA NA NA
## 4 1503960366 04/15/16 1 412 442
## 5 1503960366 04/16/16 2 340 367
## 6 1503960366 04/17/16 1 700 712
## TotalSteps TotalDistance VeryActiveDistance ModeratelyActiveDistance
## 1 13162 8.50 1.88 0.55
## 2 10735 6.97 1.57 0.69
## 3 10460 6.74 2.44 0.40
## 4 9762 6.28 2.14 1.26
## 5 12669 8.16 2.71 0.41
## 6 9705 6.48 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## SumMoreActiveMinutes SleepEfficiency
## 1 38 94.50867
## 2 40 94.34889
## 3 41 NA
## 4 63 93.21267
## 5 46 92.64305
## 6 58 98.31461
How does a good quality sleep affect number of steps taken per day?
ggplot(data=merged_finally, aes(x=TotalSteps, y=SleepEfficiency)) +
geom_line() +
geom_smooth() +
labs(title="Sleep Efficiency vs. Total Steps",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 530 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 87 rows containing missing values or values outside the scale range
## (`geom_line()`).
Insights: From our plot, the sleep efficiency seemed to get increasingly better from around 15000 steps and above. This implies once again that higher activity is more beneficial to sleep than no activity at all. On the average, participants had less than 8000 steps per day, almost less than half of the amount of steps that led to better sleep in this case.
#Getting the sum of all daily active minutes per participant:
merged_finally2 <- merged_finally %>%
mutate(TotalActiveMinutes = rowSums(across(c(LightlyActiveMinutes,FairlyActiveMinutes,VeryActiveMinutes))),
SleepEfficiency = ((TotalMinutesAsleep/TotalTimeInBed)*100))
head(merged_finally2)
## Id date TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 04/12/16 1 327 346
## 2 1503960366 04/13/16 2 384 407
## 3 1503960366 04/14/16 NA NA NA
## 4 1503960366 04/15/16 1 412 442
## 5 1503960366 04/16/16 2 340 367
## 6 1503960366 04/17/16 1 700 712
## TotalSteps TotalDistance VeryActiveDistance ModeratelyActiveDistance
## 1 13162 8.50 1.88 0.55
## 2 10735 6.97 1.57 0.69
## 3 10460 6.74 2.44 0.40
## 4 9762 6.28 2.14 1.26
## 5 12669 8.16 2.71 0.41
## 6 9705 6.48 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## SumMoreActiveMinutes SleepEfficiency TotalActiveMinutes
## 1 38 94.50867 366
## 2 40 94.34889 257
## 3 41 NA 222
## 4 63 93.21267 272
## 5 46 92.64305 267
## 6 58 98.31461 222
#Plot:
merged_finally2 %>%
select(SleepEfficiency, TotalActiveMinutes) %>%
mutate(sleep_quality = ifelse(SleepEfficiency < 60, 'Poor sleep',
ifelse(SleepEfficiency < 80, 'Good sleep',
ifelse(SleepEfficiency <= 100, 'Excellent sleep')))) %>%
mutate(active_level = ifelse(TotalActiveMinutes >= 150,'High activity',
ifelse(TotalActiveMinutes >= 30,'Moderate activity',
ifelse(TotalActiveMinutes >=1, 'Low activity',
ifelse(TotalActiveMinutes >= 0, 'Sedentary'))))) %>%
select(-c(SleepEfficiency, TotalActiveMinutes)) %>%
drop_na() %>%
group_by(sleep_quality, active_level) %>%
summarise(counts = n()) %>%
mutate(active_level = factor(active_level,
levels = c('Sedentary','Low activity',
'Moderate activity',
'High activity'))) %>%
mutate(sleep_quality = factor(sleep_quality,
levels = c('Poor sleep','Good sleep',
'Excellent sleep'))) %>%
ggplot(aes(x = sleep_quality,
y = counts,
fill = sleep_quality)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("gold", "darkblue", "darkred")) +
facet_wrap(~active_level, nrow = 1) +
theme(legend.position = "none") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(strip.text = element_text(colour = 'black', size = 8)) +
theme(strip.background = element_rect(fill = "antiquewhite1", color = 'black'))+
labs(
title = "Sleep quality by Level of Activity",
x = "Sleep quality",
y = "Count",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `summarise()` has grouped output by 'sleep_quality'. You can override using the
## `.groups` argument.
Insights: With the limitations of our dataset in mind, those who were more active experienced better sleep quality overall. It seems the more active, the better the quality of sleep. In the plot, poor sleep represents sleep efficiency of less than 60%, good sleep and excellent sleep represent sleep efficiency above 60%.
Will this be the same for length of time asleep? Let’s check it out:
merged_finally2 %>%
select(TotalMinutesAsleep, TotalActiveMinutes ) %>%
mutate(sleep_quality = ifelse(TotalMinutesAsleep <= 420, 'Poor Sleep',
ifelse(TotalMinutesAsleep <= 540, 'Optimal Sleep',
'Excess Sleep'))) %>%
mutate(active_level = ifelse(TotalActiveMinutes >= 150,'High activity',
ifelse(TotalActiveMinutes >=30 ,'Moderate activity',
ifelse(TotalActiveMinutes >= 0,'Low activity', 'Sedentary')))) %>%
select(-c(TotalMinutesAsleep, TotalActiveMinutes)) %>%
drop_na() %>%
group_by(sleep_quality, active_level) %>%
summarise(counts = n()) %>%
mutate(active_level = factor(active_level,
levels = c('Sedentary','Low activity',
'Moderate activity',
'High activity'))) %>%
mutate(sleep_quality = factor(sleep_quality,
levels = c('Poor Sleep','Optimal Sleep',
'Excess Sleep'))) %>%
ggplot(aes(x = sleep_quality,
y = counts,
fill = sleep_quality)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("#99CCFF", "#336699", "#000066")) +
facet_wrap(~active_level, nrow = 1) +
theme(legend.position = "none") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(strip.text = element_text(colour = 'black', size = 8)) +
theme(strip.background = element_rect(fill = "#CCFFFF", color = 'black'))+
labs(
title = "Length of Sleep by Level of Activity",
x = "Length of Sleep",
y = "Count",
caption = 'Data Source: FitBit Fitness Tracker Data')
## `summarise()` has grouped output by 'sleep_quality'. You can override using the
## `.groups` argument.
Insights: Once again, higher time of activity seemed to help participants have more time of sleep. On the flip side, however, it seems that it was not beneficial to length of sleep on some occasions. This can be explained by the fact that increased activity close to bedtime may influence a person’s ability to wind down as soon as possible to be able to fall asleep while in bed. In the plot optimal sleep is defined by total sleep time of 7 to 9 hours, poor sleep = total sleep time of less than 7 hours and excess is more than 9 hours.
Mean Total steps across participants was 7638 steps which is slightly below the acceptable minimum (from 8000 steps according to a University of Granada-led research in 2023) for maintaining a healthy cardiovascular system and reducing all-cause mortality. Many of the participants would have to increase the total steps taken and hence reduce sedentary time. This analysis further confirmed that taking more steps was not only beneficial for the heart but also for having more quality sleep though the length of sleep time was not very significantly affected.
It seemed most people spent a large chunk of their day in sedentary positions, an average of 16.5 hours daily. There is no information about the age, gender or health status of the participants. The aforementioned factors can affect the overall activity level of individuals. However, using the available data, it seems most users of the tracker are people with sedentary lifestyles.
On the average, participants had approximately up to 40 mins lag between actual sleep time and time in bed. An average healthy individual takes about 15 to 20 mins to fall asleep after going to bed according to SleepFoundation.org.
Most of the time was spent on activities of light intensity. Not much time was spent in moderate to very active physical activities.
Being more active overall was more beneficial than being sedentary with respect to having a longer and more quality sleeping time. However, it may be more beneficial to reduce the activity close to bedtime in order to benefit more in terms of length of sleep.
As expected, covering more distance led to increased burning of calories which could be beneficial especially to customers who are interested in maintaining a healthy weight.
It is recommended for overweight or obese individuals to spend less than 6 sedentary hours when not asleep. Overweight or obese customers could be encouraged to set healthy weight goals and achieve them by taking targeted number of steps daily using the Bellabeat app on any of their smart devices.
Because of the high level of sedentary lifestyle among users, the Bellabeat app could be used to regularly nudge customers to take little steps away from their sedentary positions at intervals to increase daily activity. Generally, customers could be encouraged to take more steps daily, increasing total steps to at least 8000 steps daily which studies have shown to be the optimal number of daily steps for helathy living. Any activity is better than no activity at all.
The WHO and many other health regulatory bodies around the world agree that engaging in a minimum of 150 minutes of moderate-to-high intensity activity weekly or 30 minutes of similar intensity activity 5 days a week is beneficial for heart health. It has been agreed that maintaining this level of activity weekly can reduce the risk of all-cause and certain disease-specific mortality.
Thus, customers can be encouraged to use the Bellabeat app to intentionally schedule and perform moderate-to-high intensity suggested activities at least 30 minutes daily with daily reminder notifications.
Carnevale, V., Macciocchi, D., & Sessa, M. (2023). Daily sedentary time of less than six hours is beneficial for the prevention of obesity in US adults. SEMS Journal. https://doi.org/10.34045/SEMS/2023/19
World Health Organization. (2020). WHO guidelines on physical activity and sedentary behaviour. World Health Organization. https://www.who.int/publications/i/item/9789240015128
University of Granada. (2023, October 26). Scientists show for the first time how many steps to take each day to reduce the risk of premature death: 8,000. ScienceDaily. https://www.sciencedaily.com/releases/2023/10/231026131551.htm
Rausch-Phung, E., & Rehman, A. (2023, December 19). How long should it take to fall asleep? SleepFoundation.org. https://www.sleepfoundation.org/sleep-faqs/how-long-should-it-take-to-fall-asleep
5.Johns Hopkins Medicine. (n.d.). Exercising for better sleep. HopkinsMedicine.org. Retrieved July 11, 2024, from https://www.hopkinsmedicine.org/health/wellness-and-prevention/exercising-for-better-sleep