Sleep Efficiency Dataset

The Sleep Efficiency Dataset has 452 observations of individuals of their sleeping habits, including Age, Gender, Bedtime, Wake-up time, Sleep duration, Sleep efficiency, REM sleep percentage, Deep sleep percentage, Light sleep percentage. The dataset also includes lifestyle habits that can influence sleeping patterns: Number of awakenings, Caffeine and / or Alcohol usage, Smoking status and Exercise frequency. Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency

Libraries

library(lubridate)
library(dplyr)
library(scales)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(plotly)

Dataset

sleep_df <- read.csv("Sleep_Efficiency.csv",row.names = "ID")
str(sleep_df)
## 'data.frame':    452 obs. of  14 variables:
##  $ Age                   : int  65 69 40 40 57 36 27 53 41 11 ...
##  $ Gender                : chr  "Female" "Male" "Female" "Female" ...
##  $ Bedtime               : chr  "2021-03-06 01:00:00" "2021-12-05 02:00:00" "2021-05-25 21:30:00" "2021-11-03 02:30:00" ...
##  $ Wakeup.time           : chr  "2021-03-06 07:00:00" "2021-12-05 09:00:00" "2021-05-25 05:30:00" "2021-11-03 08:30:00" ...
##  $ Sleep.duration        : num  6 7 8 6 8 7.5 6 10 6 9 ...
##  $ Sleep.efficiency      : num  0.88 0.66 0.89 0.51 0.76 0.9 0.54 0.9 0.79 0.55 ...
##  $ REM.sleep.percentage  : int  18 24 20 28 27 28 28 28 28 18 ...
##  $ Deep.sleep.percentage : int  70 28 70 25 55 60 25 57 60 35 ...
##  $ Light.sleep.percentage: int  10 53 10 52 18 17 52 20 17 45 ...
##  $ Awakenings            : num  0 3 1 3 3 0 2 0 3 4 ...
##  $ Caffeine.consumption  : num  0 0 0 50 0 NA 50 50 50 0 ...
##  $ Alcohol.consumption   : num  0 3 0 5 3 0 0 0 0 3 ...
##  $ Smoking.status        : chr  "Yes" "Yes" "No" "Yes" ...
##  $ Exercise.frequency    : num  3 3 3 1 3 1 1 3 1 0 ...
summary(sleep_df)
##       Age           Gender            Bedtime          Wakeup.time       
##  Min.   : 9.00   Length:452         Length:452         Length:452        
##  1st Qu.:29.00   Class :character   Class :character   Class :character  
##  Median :40.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :40.29                                                           
##  3rd Qu.:52.00                                                           
##  Max.   :69.00                                                           
##                                                                          
##  Sleep.duration   Sleep.efficiency REM.sleep.percentage Deep.sleep.percentage
##  Min.   : 5.000   Min.   :0.5000   Min.   :15           Min.   :20.00        
##  1st Qu.: 7.000   1st Qu.:0.6975   1st Qu.:20           1st Qu.:51.25        
##  Median : 7.500   Median :0.8200   Median :22           Median :60.00        
##  Mean   : 7.466   Mean   :0.7889   Mean   :23           Mean   :52.96        
##  3rd Qu.: 8.000   3rd Qu.:0.9000   3rd Qu.:27           3rd Qu.:63.00        
##  Max.   :10.000   Max.   :0.9900   Max.   :30           Max.   :75.00        
##                                                                              
##  Light.sleep.percentage   Awakenings    Caffeine.consumption
##  Min.   : 7.00          Min.   :0.000   Min.   :  0.00      
##  1st Qu.:15.00          1st Qu.:1.000   1st Qu.:  0.00      
##  Median :18.00          Median :1.000   Median : 25.00      
##  Mean   :24.83          Mean   :1.641   Mean   : 23.65      
##  3rd Qu.:27.25          3rd Qu.:3.000   3rd Qu.: 50.00      
##  Max.   :56.00          Max.   :4.000   Max.   :200.00      
##                         NA's   :20      NA's   :25          
##  Alcohol.consumption Smoking.status     Exercise.frequency
##  Min.   :0.000       Length:452         Min.   :0.000     
##  1st Qu.:0.000       Class :character   1st Qu.:0.000     
##  Median :0.000       Mode  :character   Median :2.000     
##  Mean   :1.245                          Mean   :1.791     
##  3rd Qu.:2.000                          3rd Qu.:3.000     
##  Max.   :5.000                          Max.   :5.000     
##  NA's   :16                             NA's   :6

Sleep Efficiency by Age

ggplot(sleep_df,aes(x= Age , group = Sleep.efficiency))+
  geom_histogram(bins = 15)+
  labs(x = "Age", y = "Sleep Efficiency", title = "Sleep Efficiency by Age", caption = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")+
  theme_grey()+
  theme(plot.title = element_text(hjust = 0.5))

Above it can be inferred that there is a significant drop in sleep efficiency in individuals in their late 30’s. Efficiency is primed for most individuals in their early 30’s. This may be credited to childbirth and parenting responsibilities, aging, or other factors.

Sleep Efficiency by Exercise Efficiency

ggplot(sleep_df,aes(x=Sleep.efficiency, y = Exercise.frequency))+
  geom_bar(colour="pink", fill="pink", stat="identity")+
  labs(x = "Sleep Efficiency", y = "Exercise Frequency", title = "Sleep Efficiency by Exercise Efficiency", caption = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")+
  theme_light()+
  theme(plot.title = element_text(hjust = 0.5))

From the graph it is clear that those who have a higher frequency of exercise in turn have a higher sleep efficiency. The correlation has a near linear relationship.

Does Smoking Influence Sleep Efficiency?

ggplot(sleep_df,aes(x=Age, y=Sleep.efficiency, group = Smoking.status))+
  geom_line(aes(color=Smoking.status), size = 1)+
  labs(x = "Age", y = "Sleep Efficiency",color='Smoking Status', title = "Does Smoking Influence Sleep Efficiency?", caption = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")+
  theme(plot.title = element_text(hjust = 0.5))

Smoking does influence Sleep efficiency as the peaks for the red line representing non-smokers are consistently higher than that of the blue line representing smokers.

Deep Sleep Percentage by Number of Awakenings

boxplot(Deep.sleep.percentage~Awakenings, data=sleep_df,
        col=("darkgreen"),
        main="Deep Sleep Percentage by Number of Awakenings", xlab="Number of Awakenings",ylab = "Deep Sleep Percentage",sub = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency")

The box plot above shows that those with less awakenings have a higher Deep sleep percentage than that of those with 2 or more awakenings. Outliers are represented by the circles.

DataFrame Manipulation

bedtime_df <- sleep_df%>%
  select(Bedtime)%>%
  dplyr::mutate(hour24 = hour(ymd_hms(Bedtime)))%>%
  dplyr::mutate(hour24labs = recode(hour24,"21" = "9:00 pm",
                                 "22" = "10:00 pm",
                                 "23" = "11:00 pm",
                                 "0" = "12:00 am", 
                                 "1" = "1:00 am",
                                 "2" = "2:00 am",)) %>%
  group_by(hour24,hour24labs)%>%
  dplyr::summarise(n=length(Bedtime), .groups = 'keep')%>%
  data.frame()

wakeup_df <- sleep_df%>%
  select(Wakeup.time)%>%
  dplyr::mutate(hour24 = hour(ymd_hms(Wakeup.time)))%>%
  dplyr::mutate(hour24labs = recode(hour24,"3" = "3:00 am",
                                    "4" = "4:00 am",
                                    "5" = "5:00 am",
                                    "6" = "6:00 am", 
                                    "7" = "7:00 am",
                                    "8" = "8:00 am",
                                    "9" = "9:00 am",
                                    "10" = "10:00 am", 
                                    "11" = "11:00 am",
                                    "12" = "12:00 pm",)) %>%
  group_by(hour24,hour24labs)%>%
  dplyr::summarise(n=length(Wakeup.time), .groups = 'keep')%>%
  data.frame()

Data Cleaning for a more understandable format.

Hour of Bedtime

plot_ly(bedtime_df, labels = ~hour24labs,values = ~n, type = "pie",
        textposition = "outside",textinfo= "Label + Percent")%>%
  layout(title= "Hour of Bedtime",annotations = 
           list(x = 1, y = -0.1, text = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency", 
                showarrow = F, xref='paper', yref='paper', 
                xanchor='right', yanchor='auto', xshift=0, yshift=0,
                font=list(size=15, color="black")))
The most common Bedtime for individuals is midnight

Hour of Wake-up

plot_ly(wakeup_df, labels = ~hour24labs,values = ~n, type = "pie",
        textposition = "outside",textinfo= "Label + Percent")%>%
  layout(title= "Hour of Wake-up",annotations = 
           list(x = 1, y = -0.1, text = "Source: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency", 
                showarrow = F, xref='paper', yref='paper', 
                xanchor='right', yanchor='auto', xshift=0, yshift=0,
                font=list(size=15, color="black")))
The hours that individuals wake-up has a wider distribution that that of the bedtime hours with 5:00 am and 7:00 am being the time for most.

Conclusion

In sum it can be concluded that the best sleep routine can be attained by waking at sunrise, exercising often, refraining from excessive alcohol, caffeine, and tobacco.