The Sleep Efficiency Dataset has 452 observations of individuals of their sleeping habits, including Age, Gender, Bedtime, Wake-up time, Sleep duration, Sleep efficiency, REM sleep percentage, Deep sleep percentage, Light sleep percentage. The dataset also includes lifestyle habits that can influence sleeping patterns: Number of awakenings, Caffeine and / or Alcohol usage, Smoking status and Exercise frequency. Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency
library(lubridate)
library(dplyr)
library(scales)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(plotly)
sleep_df <- read.csv("Sleep_Efficiency.csv",row.names = "ID")
str(sleep_df)
## 'data.frame': 452 obs. of 14 variables:
## $ Age : int 65 69 40 40 57 36 27 53 41 11 ...
## $ Gender : chr "Female" "Male" "Female" "Female" ...
## $ Bedtime : chr "2021-03-06 01:00:00" "2021-12-05 02:00:00" "2021-05-25 21:30:00" "2021-11-03 02:30:00" ...
## $ Wakeup.time : chr "2021-03-06 07:00:00" "2021-12-05 09:00:00" "2021-05-25 05:30:00" "2021-11-03 08:30:00" ...
## $ Sleep.duration : num 6 7 8 6 8 7.5 6 10 6 9 ...
## $ Sleep.efficiency : num 0.88 0.66 0.89 0.51 0.76 0.9 0.54 0.9 0.79 0.55 ...
## $ REM.sleep.percentage : int 18 24 20 28 27 28 28 28 28 18 ...
## $ Deep.sleep.percentage : int 70 28 70 25 55 60 25 57 60 35 ...
## $ Light.sleep.percentage: int 10 53 10 52 18 17 52 20 17 45 ...
## $ Awakenings : num 0 3 1 3 3 0 2 0 3 4 ...
## $ Caffeine.consumption : num 0 0 0 50 0 NA 50 50 50 0 ...
## $ Alcohol.consumption : num 0 3 0 5 3 0 0 0 0 3 ...
## $ Smoking.status : chr "Yes" "Yes" "No" "Yes" ...
## $ Exercise.frequency : num 3 3 3 1 3 1 1 3 1 0 ...
summary(sleep_df)
## Age Gender Bedtime Wakeup.time
## Min. : 9.00 Length:452 Length:452 Length:452
## 1st Qu.:29.00 Class :character Class :character Class :character
## Median :40.00 Mode :character Mode :character Mode :character
## Mean :40.29
## 3rd Qu.:52.00
## Max. :69.00
##
## Sleep.duration Sleep.efficiency REM.sleep.percentage Deep.sleep.percentage
## Min. : 5.000 Min. :0.5000 Min. :15 Min. :20.00
## 1st Qu.: 7.000 1st Qu.:0.6975 1st Qu.:20 1st Qu.:51.25
## Median : 7.500 Median :0.8200 Median :22 Median :60.00
## Mean : 7.466 Mean :0.7889 Mean :23 Mean :52.96
## 3rd Qu.: 8.000 3rd Qu.:0.9000 3rd Qu.:27 3rd Qu.:63.00
## Max. :10.000 Max. :0.9900 Max. :30 Max. :75.00
##
## Light.sleep.percentage Awakenings Caffeine.consumption
## Min. : 7.00 Min. :0.000 Min. : 0.00
## 1st Qu.:15.00 1st Qu.:1.000 1st Qu.: 0.00
## Median :18.00 Median :1.000 Median : 25.00
## Mean :24.83 Mean :1.641 Mean : 23.65
## 3rd Qu.:27.25 3rd Qu.:3.000 3rd Qu.: 50.00
## Max. :56.00 Max. :4.000 Max. :200.00
## NA's :20 NA's :25
## Alcohol.consumption Smoking.status Exercise.frequency
## Min. :0.000 Length:452 Min. :0.000
## 1st Qu.:0.000 Class :character 1st Qu.:0.000
## Median :0.000 Mode :character Median :2.000
## Mean :1.245 Mean :1.791
## 3rd Qu.:2.000 3rd Qu.:3.000
## Max. :5.000 Max. :5.000
## NA's :16 NA's :6
ggplot(sleep_df,aes(x= Age , group = Sleep.efficiency))+
geom_histogram(bins = 15)+
labs(x = "Age", y = "Sleep Efficiency", title = "Sleep Efficiency by Age", caption = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")+
theme_grey()+
theme(plot.title = element_text(hjust = 0.5))
Above it can be inferred that there is a significant drop in sleep
efficiency in individuals in their late 30’s. Efficiency is primed for
most individuals in their early 30’s. This may be credited to childbirth
and parenting responsibilities, aging, or other factors.
ggplot(sleep_df,aes(x=Sleep.efficiency, y = Exercise.frequency))+
geom_bar(colour="pink", fill="pink", stat="identity")+
labs(x = "Sleep Efficiency", y = "Exercise Frequency", title = "Sleep Efficiency by Exercise Efficiency", caption = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")+
theme_light()+
theme(plot.title = element_text(hjust = 0.5))
From the graph it is clear that those who have a higher frequency of
exercise in turn have a higher sleep efficiency. The correlation has a
near linear relationship.
ggplot(sleep_df,aes(x=Age, y=Sleep.efficiency, group = Smoking.status))+
geom_line(aes(color=Smoking.status), size = 1)+
labs(x = "Age", y = "Sleep Efficiency",color='Smoking Status', title = "Does Smoking Influence Sleep Efficiency?", caption = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")+
theme(plot.title = element_text(hjust = 0.5))
Smoking does influence Sleep efficiency as the peaks for the red line
representing non-smokers are consistently higher than that of the blue
line representing smokers.
boxplot(Deep.sleep.percentage~Awakenings, data=sleep_df,
col=("darkgreen"),
main="Deep Sleep Percentage by Number of Awakenings", xlab="Number of Awakenings",ylab = "Deep Sleep Percentage",sub = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")
The box plot above shows that those with less awakenings have a higher
Deep sleep percentage than that of those with 2 or more awakenings.
Outliers are represented by the circles.
bedtime_df <- sleep_df%>%
select(Bedtime)%>%
dplyr::mutate(hour24 = hour(ymd_hms(Bedtime)))%>%
dplyr::mutate(hour24labs = recode(hour24,"21" = "9:00 pm",
"22" = "10:00 pm",
"23" = "11:00 pm",
"0" = "12:00 am",
"1" = "1:00 am",
"2" = "2:00 am",)) %>%
group_by(hour24,hour24labs)%>%
dplyr::summarise(n=length(Bedtime), .groups = 'keep')%>%
data.frame()
wakeup_df <- sleep_df%>%
select(Wakeup.time)%>%
dplyr::mutate(hour24 = hour(ymd_hms(Wakeup.time)))%>%
dplyr::mutate(hour24labs = recode(hour24,"3" = "3:00 am",
"4" = "4:00 am",
"5" = "5:00 am",
"6" = "6:00 am",
"7" = "7:00 am",
"8" = "8:00 am",
"9" = "9:00 am",
"10" = "10:00 am",
"11" = "11:00 am",
"12" = "12:00 pm",)) %>%
group_by(hour24,hour24labs)%>%
dplyr::summarise(n=length(Wakeup.time), .groups = 'keep')%>%
data.frame()
Data Cleaning for a more understandable format.
plot_ly(bedtime_df, labels = ~hour24labs,values = ~n, type = "pie",
textposition = "outside",textinfo= "Label + Percent")%>%
layout(title= "Hour of Bedtime",annotations =
list(x = 1, y = -0.1, text = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency",
showarrow = F, xref='paper', yref='paper',
xanchor='right', yanchor='auto', xshift=0, yshift=0,
font=list(size=15, color="black")))
The most common Bedtime for individuals is midnight
plot_ly(wakeup_df, labels = ~hour24labs,values = ~n, type = "pie",
textposition = "outside",textinfo= "Label + Percent")%>%
layout(title= "Hour of Wake-up",annotations =
list(x = 1, y = -0.1, text = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency",
showarrow = F, xref='paper', yref='paper',
xanchor='right', yanchor='auto', xshift=0, yshift=0,
font=list(size=15, color="black")))
The hours that individuals wake-up has a wider distribution that that of the bedtime hours with 5:00 am and 7:00 am being the time for most.
In sum it can be concluded that the best sleep routine can be attained by waking at sunrise, exercising often, refraining from excessive alcohol, caffeine, and tobacco.