sleep_health_dataset <- read.csv("sleep_health_dataset.csv")How Lifestyle & Behavioral Habits Affect Sleep Quality and Daily Performance
Introduction
Sleep plays a critical role in overall health and daily functioning. However, a lot of people experience poor sleep due to things like stress, excessive screen time, long work hours, or late caffeine consumption. Insufficient sleep can cause people to experience fatigue, lack of focus, and decreased performance throughout the day.
This project examines how lifestyle and behavioral habits influence sleep quality and daily performance. Through analyzing the dataset, we aim to identify which factors contribute to better or worse sleep outcomes, as well as examine the relationships among different variables. This understanding can help individuals make better choices and improve their daily lives.
Project Goal
The purpose of this project is to study how lifestyle and behavioral habits affect sleep quality and daily performance.
Data
We obtained a sleep health dataset from Kaggle (Sleep_Health_Dataset) with 100,000 records. This is a synthetic dataset, meaning the data was generated but designed to reflect real-world patterns based on research studies. It includes variables related to sleep, lifestyle habits, and performance, such as sleep quality, stress, screen time, and cognitive performance. Each row represents one individual’s daily data.
library(tidyverse)
library(knitr)
data.frame(Variable_Names = names(sleep_health_dataset)) |>
knitr::kable(
caption = "Variable Names in Sleep Health Dataset"
)| Variable_Names |
|---|
| person_id |
| age |
| gender |
| occupation |
| bmi |
| country |
| sleep_duration_hrs |
| sleep_quality_score |
| rem_percentage |
| deep_sleep_percentage |
| sleep_latency_mins |
| wake_episodes_per_night |
| caffeine_mg_before_bed |
| alcohol_units_before_bed |
| screen_time_before_bed_mins |
| exercise_day |
| steps_that_day |
| nap_duration_mins |
| stress_score |
| work_hours_that_day |
| chronotype |
| mental_health_condition |
| heart_rate_resting_bpm |
| sleep_aid_used |
| shift_work |
| room_temperature_celsius |
| weekend_sleep_diff_hrs |
| season |
| day_type |
| cognitive_performance_score |
| sleep_disorder_risk |
| felt_rested |
You can interact with the data using the search box. (Only 10,000 rows are available to search due to original 100,000 rows being too many for an interactive data table)
library(DT)
datatable(head(sleep_health_dataset, 10000))Analysis
This project analyzes the data across three main areas:
Sleep quality and duration
Relationships between sleep quality and lifestyle behaviors such as stress, screen time, caffeine use, exercise, and mental health state
A deeper exploration of how sleep impacts cognitive performance
Target Variable Analysis
To better understand our main variables of focus, sleep_quality_score and cognitive_performance_score, we summarize the dataset by reporting the total number of records and variables, along with the average, minimum, maximum, and median values for both variables. We also examine their distributions. The results show that the average overall sleep quality score is 4.87 with a median of 4.9 while the average cognitive performance score is 59.23 with a median of 60.4.
library(tidyverse)
library(knitr)
options(scipen = 999)
Row1 <- sleep_health_dataset |>
summarize(
Records = as.numeric(n()),
Variables = ncol(sleep_health_dataset),
Mean = mean(sleep_quality_score, na.rm = TRUE),
Min = min(sleep_quality_score, na.rm = TRUE),
Median = median(sleep_quality_score, na.rm = TRUE),
Max = max(sleep_quality_score, na.rm = TRUE)
)
Row2 <- sleep_health_dataset |>
summarize(
Records = as.numeric(n()),
Variables = ncol(sleep_health_dataset),
Mean = mean(cognitive_performance_score, na.rm = TRUE),
Min = min(cognitive_performance_score, na.rm = TRUE),
Median = median(cognitive_performance_score, na.rm = TRUE),
Max = max(cognitive_performance_score, na.rm = TRUE)
)
summary_table <- data.frame(rbind(Row1, Row2))
rownames(summary_table) <- c("sleep_quality_score", "cognitive_performance_score")
summary_table |>
knitr::kable(
caption = "Summary Statistics for Target Variables",
digits = 2
)| Records | Variables | Mean | Min | Median | Max | |
|---|---|---|---|---|---|---|
| sleep_quality_score | 100000 | 32 | 4.87 | 1 | 4.9 | 10 |
| cognitive_performance_score | 100000 | 32 | 59.23 | 0 | 60.4 | 100 |
Data Visualization
1. Sleep Quality Histogram
library(ggplot2)
sleep_health_dataset |>
filter(!is.na(sleep_quality_score)) |>
ggplot(aes(x = sleep_quality_score)) +
geom_histogram(bins = 20, fill = "darkblue") +
labs(
title = "Distribution of Sleep Quality Score",
x = "Sleep Quality Score",
y = "Freuqency"
)This histogram shows the distribution of sleep quality scores across people in the dataset. The x-axis represents the sleep quality score and the y-axis represents the frequency. The distribution appears to be roughly centered around the middle values with most observations falling between about 4 and 7. This suggests that the majority of individuals experience moderate sleep quality. However, we do have some values at both the lower and higher ends but there are fewer people in those ranges. This means very poor or very high sleep quality is less common.
2. Cognitive Performance Histogram
sleep_health_dataset |>
filter(!is.na(cognitive_performance_score)) |>
ggplot(aes(x = cognitive_performance_score)) +
geom_histogram(bins = 20, fill = "plum") +
labs(
title = "Distribution of Cognitive Performance Score",
x = "Cognitive Performance Score",
y = "Freuqency"
)This histogram shows how cognitive performance scores are distributed. Most of the values fall in the middle range between 50 and 80, which means many individuals have moderate to high cognitive performance. The graph is slightly skewed to the left showing there are more values toward the higher end, which suggests that many individuals performed well. This shows that overall, most individuals tend to have fairly good cognitive performance, with fewer people scoring at the very low or very high ends.
3. Stress vs Sleep Quality
sleep_health_dataset |>
filter(!is.na(stress_score),
!is.na(sleep_quality_score)) |>
ggplot(aes(x = stress_score, y = sleep_quality_score)) +
geom_point(color = "cadetblue", size = 1.5, alpha = 0.1) +
geom_smooth(color= "red", se = FALSE) +
labs(
title = "Stress Score vs Sleep Quality Score",
x = "Stress Score",
y = "Sleep Quality Score"
) +
scale_x_continuous(breaks = seq(0, 10, 2)) +
scale_y_continuous(breaks = seq(0, 10, 2))This scatter plot shows the relationship between stress score and sleep quality score. The plot shows a clear downward trend, meaning that as stress increases, sleep quality tends to decrease. The trend line shows this relationship more clearly. Even though the points are spread out the overall pattern still shows a strong negative relationship between stress and sleep quality.
4. Sleep Quality Score by Mental Health Condition
ggplot(sleep_health_dataset,
aes(x = mental_health_condition, y = sleep_quality_score,
fill = mental_health_condition)) +
geom_boxplot() +
labs(
title = "Sleep Quality Score by Mental Health Condition",
x = NULL,
y = "Sleep Quality Score",
fill = "Mental Health Condition"
) +
theme_minimal() +
scale_y_continuous(breaks = seq(0, 10, 2)) +
scale_x_discrete(limits = c("Healthy", "Anxiety", "Depression", "Both")) +
scale_fill_discrete(limits = c("Healthy", "Anxiety", "Depression", "Both"))From the stratified boxplot, we can see mental health condition does clearly impact sleep quality. Healthy adults have a median sleep quality score around 5.3/10 while those with Anxiety or Depression have a median sleep quality score of about 4.1/10. Those with both anxiety and depression have an even lower median sleepy quality score of about 3.3/10.
5. Sleep Quality Score by Occupation
sleep_health_dataset |>
filter(occupation %in% c("Student", "Nurse", "Software Engineer", "Teacher", "Freelancer", "Homemaker", "Lawyer", "Retired")) |>
ggplot(aes(x = sleep_quality_score,
color = occupation)) +
geom_density(size = 1.2) +
labs(
title = "Distribution of Sleep Quality Score by Occupation",
x = "Sleep Quality Score",
y = "Density",
color = "Occupation"
) +
scale_color_discrete(limits = c("Lawyer", "Nurse", "Student", "Software Engineer", "Teacher", "Homemaker", "Freelancer", "Retired"))Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The density plot shows that sleep quality varies across occupations (8 shown here). Retired individuals tend to have the highest sleep quality, with homemakers and freelancers coming in second and third best sleep quality. Occupations like nurses, lawyers, and students show the lowest sleep quality. This suggests that occupation may influence sleep quality due to differences in lifestyle and work demands.
occupation_summary <- sleep_health_dataset |>
group_by(occupation) |>
summarize(
mean_sleep_quality_score = mean(sleep_quality_score, na.rm = TRUE),
median_sleep_quality_score = median(sleep_quality_score,na.rm = TRUE),
count = n()
) |>
arrange(desc(mean_sleep_quality_score))
knitr::kable(
occupation_summary,
caption = "Mean & Median Sleep Quality Score by Occupation"
)| occupation | mean_sleep_quality_score | median_sleep_quality_score | count |
|---|---|---|---|
| Retired | 6.620679 | 6.7 | 7036 |
| Homemaker | 5.678372 | 5.8 | 5923 |
| Freelancer | 5.652751 | 5.8 | 7016 |
| Teacher | 5.137629 | 5.3 | 8047 |
| Software Engineer | 5.028455 | 5.1 | 12068 |
| Sales | 4.892333 | 5.0 | 7017 |
| Manager | 4.677077 | 4.7 | 8101 |
| Student | 4.576022 | 4.6 | 14851 |
| Driver | 4.303073 | 4.3 | 6996 |
| Doctor | 4.255084 | 4.2 | 7868 |
| Nurse | 4.111317 | 4.1 | 10073 |
| Lawyer | 4.004616 | 3.9 | 5004 |
Table showing mean & median sleep quality scores for occupations, supporting the density plot above.
6. Cognitive Performance by Caffeine Intake Before Bed
sleep_health_dataset |>
mutate(caffeine_group = case_when(
caffeine_mg_before_bed == 0 ~ "None",
caffeine_mg_before_bed <= 100 ~ "Low (1-100)",
caffeine_mg_before_bed <= 300 ~ "Moderate (101-300)",
TRUE ~ "High (300+)"
)) |>
ggplot(aes(x = caffeine_group,
y = cognitive_performance_score,
fill = caffeine_group)) +
geom_boxplot() +
labs(
title = "Cognitive Performance by Caffeine Intake Before Bed",
x = "Caffeine Group",
y = "Cognitive Performance Score"
) +
scale_x_discrete(limits = c("None", "Low (1-100)", "Moderate (101-300)", "High (300+)")) +
scale_fill_discrete(limits = c("None", "Low (1-100)", "Moderate (101-300)", "High (300+)"))In the stratified boxplots above, we can see that as caffeine intake before bed increases, cognitive performance score decreases. This is clear through seeing how the median cognitive performance scores decrease gradually as we move through higher caffeine intake groups. Though, there is a lot of overlap in the boxplots so the effect is not strong but it does still exist.
7. Sleep Quality vs. Screen Time Before Bed
sleep_health_dataset |>
sample_n(5000) |>
mutate(age_cat = case_when(
age < 18 ~ "Under 18",
age < 25 ~ "18 - 24",
age < 35 ~ "25 - 34",
age < 45 ~ "35 - 44",
age < 55 ~ "45 - 54",
age >= 55 ~ "55+"
)) |>
filter(!is.na(screen_time_before_bed_mins),
!is.na(sleep_quality_score),
!is.na(age)) |>
ggplot(aes(x = screen_time_before_bed_mins, y = sleep_quality_score, color = age_cat)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = quote('Screen Time Before Bed (in minutes)'),
y = quote('Sleep Quality Score'),
color = quote('Age')
)`geom_smooth()` using formula = 'y ~ x'
8. Sleep Quality by Room Temperature
sleep_health_dataset |>
sample_n(5000) |>
filter(!is.na(room_temperature_celsius),
!is.na(sleep_quality_score),
!is.na(gender)) |>
ggplot(aes(x = room_temperature_celsius, y = sleep_quality_score, color = gender)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = quote('Room Temperature'),
y = quote('Sleep Quality Score'),
color = quote('Gender')
)`geom_smooth()` using formula = 'y ~ x'
9. Sleep Quality versus Step Count by Exercise Day
sleep_health_dataset |>
sample_n(5000) |>
mutate(
exercise_day_2 = case_when(
exercise_day == 0 ~ quote('Inactive Day'),
exercise_day == 1 ~ quote('Active Day')
)
) |>
filter(!is.na(steps_that_day),
!is.na(sleep_quality_score)) |>
ggplot(aes(x = steps_that_day, y = sleep_quality_score,
color = as.factor(exercise_day_2))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = quote('Number of Steps'),
y = quote('Sleep Quality Score'),
color = quote('Exercise Day')
)`geom_smooth()` using formula = 'y ~ x'
10. Stress vs Cognitive Performance by Mental Health Status
sleep_health_dataset |>
sample_n(5000) |>
ggplot(aes(x = stress_score,
y = cognitive_performance_score,
color = mental_health_condition)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
labs(
title = "Stress vs Cognitive Performance by Mental Health",
x = "Stress Score",
y = "Cognitive Performance Score",
color = 'Mental Health Diagnosis'
)`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Interactive Scatterplot of Sleep versus Performance by Mental Health Condition
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
plot_ly(
data = sleep_health_dataset,
x = ~sleep_quality_score,
y = ~cognitive_performance_score,
color = ~mental_health_condition,
type = "scatter",
mode = "markers",
text = ~paste(
"Sleep:", sleep_quality_score,
"<br>Performance:", cognitive_performance_score,
"<br>Stress:", stress_score
),
hoverinfo = "text"
) %>%
layout(
title = "Sleep vs Performance by Mental Health",
xaxis = list(title = "Sleep Quality Score", range = c(0, 10)),
yaxis = list(title = "Cognitive Performance Score")
)The Scatter plot shows a strong positive relationship between cognitive performance score and sleep quality score. Meaning that the higher the sleep quality score the higher the cognitive performance score. The Scatter plot is also colored by the mental health of the people and it shows the stress level of the person.
Interactive Scatter Plot of Sleep Quality vs Sleep Duration (in hours) by Country
plot_ly(
data = sleep_health_dataset,
x = ~sleep_duration_hrs,
y = ~sleep_quality_score,
color = ~country,
type = "scatter",
mode = "markers",
text = ~paste(
"Sleep Length (hours):", sleep_duration_hrs,
"<br>Sleep Quality:", sleep_quality_score,
"<br>Stress:", country
),
hoverinfo = "text"
) %>%
layout(
title = "Sleep Time vs Quality by Country",
xaxis = list(title = "Sleep Duration (hours)"),
yaxis = list(title = "Sleep Quality Score"),
range = c(0, 10), autorange = FALSE
)Warning in RColorBrewer::brewer.pal(max(N, 3L), "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(max(N, 3L), "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning: 'layout' objects don't have these attributes: 'range', 'autorange'
Valid attributes include:
'_deprecated', 'activeselection', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'minreducedheight', 'minreducedwidth', 'modebar', 'newselection', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'selections', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'boxmode', 'barmode', 'bargap', 'mapType'