About the data: “The dataset contains information about a group of test subjects and their sleep patterns. Each test subject is identified by a unique”Subject ID” and their age and gender are also recorded. The “Bedtime” and “Wakeup time” features indicate when each subject goes to bed and wakes up each day, and the “Sleep duration” feature records the total amount of time each subject slept in hours. The “Sleep efficiency” feature is a measure of the proportion of time spent in bed that is actually spent asleep. The “REM sleep percentage”, “Deep sleep percentage”, and “Light sleep percentage” features indicate the amount of time each subject spent in each stage of sleep. The “Awakenings” feature records the number of times each subject wakes up during the night. Additionally, the dataset includes information about each subject’s caffeine and alcohol consumption in the 24 hours prior to bedtime, their smoking status, and their exercise frequency”
More information on the data and the source’s author, Kaggle user Equilibriumm, visit https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency
#Load Data
sleep_df <- read.csv("Sleep_Efficiency.csv",row.names = "ID")
#Change Gender and Smoking Status to num
sleep_df2 <- sleep_df%>%
mutate(Gender = recode(Gender,
"Female" = "0",
"Male" = "1",),
Smoking.status = recode(Smoking.status,
"Yes" = "1",
"No" = "0"),
Bedtime = hour(ymd_hms(Bedtime)),
Wakeup.time = hour(ymd_hms(Wakeup.time)))%>%
mutate(Gender = as.numeric(Gender),
Smoking.status = as.numeric(Smoking.status))%>%
data.frame()
#remove non numericals
#cor(sleep_df[,-c(2,3,4,13)])
#replace NA with most frequent value for Awakenings
sleep_df2$Awakenings<-na.replace(sleep_df$Awakenings,1)
# "" Caffeine Consumption
sleep_df2$Caffeine.consumption<-na.replace(sleep_df$Caffeine.consumption,0)
# "" Alcohol Consumption
sleep_df2$Alcohol.consumption<-na.replace(sleep_df$Alcohol.consumption,0)
# "" Exercise Frequency
sleep_df2$Exercise.frequency<-na.replace(sleep_df$Exercise.frequency,0)
summary(sleep_df2)
## Age Gender Bedtime Wakeup.time
## Min. : 9.00 Min. :0.0000 Min. : 0.00 Min. : 3.000
## 1st Qu.:29.00 1st Qu.:0.0000 1st Qu.: 1.00 1st Qu.: 5.000
## Median :40.00 Median :1.0000 Median : 2.00 Median : 7.000
## Mean :40.29 Mean :0.5044 Mean :10.66 Mean : 6.898
## 3rd Qu.:52.00 3rd Qu.:1.0000 3rd Qu.:22.00 3rd Qu.: 9.000
## Max. :69.00 Max. :1.0000 Max. :23.00 Max. :12.000
## Sleep.duration Sleep.efficiency REM.sleep.percentage Deep.sleep.percentage
## Min. : 5.000 Min. :0.5000 Min. :15 Min. :20.00
## 1st Qu.: 7.000 1st Qu.:0.6975 1st Qu.:20 1st Qu.:51.25
## Median : 7.500 Median :0.8200 Median :22 Median :60.00
## Mean : 7.466 Mean :0.7889 Mean :23 Mean :52.96
## 3rd Qu.: 8.000 3rd Qu.:0.9000 3rd Qu.:27 3rd Qu.:63.00
## Max. :10.000 Max. :0.9900 Max. :30 Max. :75.00
## Light.sleep.percentage Awakenings Caffeine.consumption
## Min. : 7.00 Min. :0.000 Min. : 0.00
## 1st Qu.:15.00 1st Qu.:1.000 1st Qu.: 0.00
## Median :18.00 Median :1.000 Median : 0.00
## Mean :24.83 Mean :1.613 Mean : 22.35
## 3rd Qu.:27.25 3rd Qu.:3.000 3rd Qu.: 50.00
## Max. :56.00 Max. :4.000 Max. :200.00
## Alcohol.consumption Smoking.status Exercise.frequency
## Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.000 Median :0.0000 Median :2.000
## Mean :1.201 Mean :0.3562 Mean :1.768
## 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:3.000
## Max. :5.000 Max. :1.0000 Max. :5.000
You can also embed plots, for example:
#redo correlation matrix
#cor(sleep_df2[,-c(3,4)])
#pairs plot
pairs(sleep_df2[,-c(3,4)])
#highlights
corrplot(cor(sleep_df2[,-c(3,4)]),method='number')
TO BE CONTINUED