Sleep Efficiency Data

About the data: “The dataset contains information about a group of test subjects and their sleep patterns. Each test subject is identified by a unique”Subject ID” and their age and gender are also recorded. The “Bedtime” and “Wakeup time” features indicate when each subject goes to bed and wakes up each day, and the “Sleep duration” feature records the total amount of time each subject slept in hours. The “Sleep efficiency” feature is a measure of the proportion of time spent in bed that is actually spent asleep. The “REM sleep percentage”, “Deep sleep percentage”, and “Light sleep percentage” features indicate the amount of time each subject spent in each stage of sleep. The “Awakenings” feature records the number of times each subject wakes up during the night. Additionally, the dataset includes information about each subject’s caffeine and alcohol consumption in the 24 hours prior to bedtime, their smoking status, and their exercise frequency”

More information on the data and the source’s author, Kaggle user Equilibriumm, visit https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency

Data Prep

#Load Data

sleep_df <- read.csv("Sleep_Efficiency.csv",row.names = "ID")

#Change Gender and Smoking Status to num
sleep_df2 <- sleep_df%>%
  mutate(Gender = recode(Gender,
                         "Female" = "0",
                         "Male" = "1",),
         Smoking.status = recode(Smoking.status,
                                 "Yes" = "1",
                                 "No" = "0"),
         Bedtime = hour(ymd_hms(Bedtime)),
         Wakeup.time = hour(ymd_hms(Wakeup.time)))%>%
  mutate(Gender = as.numeric(Gender),
         Smoking.status = as.numeric(Smoking.status))%>%
  data.frame()

#remove non numericals
#cor(sleep_df[,-c(2,3,4,13)])
#replace NA with most frequent value for Awakenings
sleep_df2$Awakenings<-na.replace(sleep_df$Awakenings,1)
# "" Caffeine Consumption
sleep_df2$Caffeine.consumption<-na.replace(sleep_df$Caffeine.consumption,0)
# "" Alcohol Consumption
sleep_df2$Alcohol.consumption<-na.replace(sleep_df$Alcohol.consumption,0)
# "" Exercise Frequency
sleep_df2$Exercise.frequency<-na.replace(sleep_df$Exercise.frequency,0)

Data Preview

summary(sleep_df2)  
##       Age            Gender          Bedtime       Wakeup.time    
##  Min.   : 9.00   Min.   :0.0000   Min.   : 0.00   Min.   : 3.000  
##  1st Qu.:29.00   1st Qu.:0.0000   1st Qu.: 1.00   1st Qu.: 5.000  
##  Median :40.00   Median :1.0000   Median : 2.00   Median : 7.000  
##  Mean   :40.29   Mean   :0.5044   Mean   :10.66   Mean   : 6.898  
##  3rd Qu.:52.00   3rd Qu.:1.0000   3rd Qu.:22.00   3rd Qu.: 9.000  
##  Max.   :69.00   Max.   :1.0000   Max.   :23.00   Max.   :12.000  
##  Sleep.duration   Sleep.efficiency REM.sleep.percentage Deep.sleep.percentage
##  Min.   : 5.000   Min.   :0.5000   Min.   :15           Min.   :20.00        
##  1st Qu.: 7.000   1st Qu.:0.6975   1st Qu.:20           1st Qu.:51.25        
##  Median : 7.500   Median :0.8200   Median :22           Median :60.00        
##  Mean   : 7.466   Mean   :0.7889   Mean   :23           Mean   :52.96        
##  3rd Qu.: 8.000   3rd Qu.:0.9000   3rd Qu.:27           3rd Qu.:63.00        
##  Max.   :10.000   Max.   :0.9900   Max.   :30           Max.   :75.00        
##  Light.sleep.percentage   Awakenings    Caffeine.consumption
##  Min.   : 7.00          Min.   :0.000   Min.   :  0.00      
##  1st Qu.:15.00          1st Qu.:1.000   1st Qu.:  0.00      
##  Median :18.00          Median :1.000   Median :  0.00      
##  Mean   :24.83          Mean   :1.613   Mean   : 22.35      
##  3rd Qu.:27.25          3rd Qu.:3.000   3rd Qu.: 50.00      
##  Max.   :56.00          Max.   :4.000   Max.   :200.00      
##  Alcohol.consumption Smoking.status   Exercise.frequency
##  Min.   :0.000       Min.   :0.0000   Min.   :0.000     
##  1st Qu.:0.000       1st Qu.:0.0000   1st Qu.:0.000     
##  Median :0.000       Median :0.0000   Median :2.000     
##  Mean   :1.201       Mean   :0.3562   Mean   :1.768     
##  3rd Qu.:2.000       3rd Qu.:1.0000   3rd Qu.:3.000     
##  Max.   :5.000       Max.   :1.0000   Max.   :5.000

Correlation Matrix

You can also embed plots, for example:

#redo correlation matrix
#cor(sleep_df2[,-c(3,4)])
#pairs plot
pairs(sleep_df2[,-c(3,4)])

#highlights
corrplot(cor(sleep_df2[,-c(3,4)]),method='number')

TO BE CONTINUED