The Sleep Health and Lifestyle Dataset shows that adult sleep quality is not determined only by sleep duration. Instead, it seems to be shaped by several lifestyle factors working together, including physical activity, stress, and BMI.
Q1 showed that sleep duration alone cannot fully explain
sleep quality.
Even when people slept for the “optimal” 6.5 ~ 8 hours, their sleep
quality still ranged widely from 5 to 8 points. Likewise, some people
with short sleep durations (~ 6.5 hours) still reported fairly good
sleep (6-7 points). This means that duration by itself is not the key
factor behind good sleep.
Q2 showed how multiple lifestyle factors individually influence
sleep quality.
People with moderate–high physical activity (61 ~ 80) had the highest
sleep quality, while very low activity was linked to clearly poorer
sleep. Stress showed the reverse trend: sleep quality dropped sharply as
stress increased, from above 8 at low stress (level 3) to the mid-5
range at high stress (levels 7 ~ 8). BMI showed a similar pattern, where
people with a Normal BMI slept best, and sleep quality declined in the
Overweight and Obese groups.
Q3 revealed the most important pattern in this dataset: physical
activity appears to buffer the negative effect of stress on
sleep.
Although sleep quality generally worsens as stress increases, it
remained relatively stable among people with high physical activity.
When stress was low, physical activity did not matter very much, but as
stress increased, activity level became crucial for explaining
differences in sleep quality. In particular, moderate–high physical
activity (61~ 80) showed the strongest ‘buffering effect’.
Overall, these findings suggest that lifestyle factors are interconnected rather than acting alone. Regular physical activity stands out as a particularly effective way to improve sleep. People who engage in moderate–high activity were less likely to experience high stress, and exercise seems to provide a “double benefit”: supporting sleep directly and reducing the negative impact of stress. While further studies using real-world data are needed, regular physical activity appears to be a more practical strategy for maintaining healthy sleep than relying only on stress control.
The Sleep Health and Lifestyle Dataset is a synthetic dataset created by a student at the University of Moratuwa for illustrative and educational purposes. Comprising 400 rows and 13 columns, the dataset covers a wide range of variables related to sleep and daily habits. The data variables can be categorized as follows: Sleep Metrics: Duration, quality, and patterns. Lifestyle Factors: Physical activity, stress, and BMI. Cardiovascular Health: Blood pressure and heart rate. Sleep Disorders: Presence of Insomnia and Sleep Apnea.
The original dataset was loaded using read_csv(), followed by an examination of the overall variable structure. To streamline the analysis, the rename() function was applied to standardize variable names into a concise format. Subsequently, select() was used to extract only the sleep metrics and major lifestyle variables. This process simplified the data structure, facilitating efficient visualization and analysis.
sleep_raw <- read_csv("Sleep_health_and_lifestyle_dataset.csv")
sleep_renamed <- sleep_raw %>%
rename(
sleep_duration = `Sleep Duration`,
sleep_quality = `Quality of Sleep`,
physical_activity = `Physical Activity Level`,
stress_level = `Stress Level`,
bmi_category = `BMI Category`
) %>%
select(
sleep_duration,
sleep_quality,
physical_activity,
stress_level,
bmi_category
)
To examine the correlation between sleep duration and sleep quality, sleep duration was categorized into ‘Short,’ ‘Medium,’ and ‘Long’ intervals, and three corresponding scatter plots were generated. This categorization reflects the general consensus that 7~8 hours represents the ‘optimal sleep duration’, allowing for verification of whether this range correlates with peak sleep quality in the actual data. Particular attention was paid to cases where individuals experienced low quality despite adequate sleep, or maintained similar quality despite short sleep duration. These observations highlighted the need to investigate the potential influence of other lifestyle factors beyond sleep duration alone.
sleep_1 <- sleep_renamed %>%
mutate(
duration_group = ifelse(sleep_duration < 6.5, "Short (~ 6.5h)",
ifelse(sleep_duration <= 8, "Medium (6.5 ~ 8h)",
"Long (8h ~ )"))
)
p_q1 <- ggplot(data=sleep_1,
mapping = aes(x = sleep_duration, y = sleep_quality))
p_q1 +
geom_point(alpha = 0.3, size = 1.5) +
geom_smooth(method = "loess", se = FALSE) +
facet_wrap(~ duration_group, ncol = 1) +
labs(
title = "Q1. Effect of Sleep Duration on Sleep Quality",
x = "Sleep Duration (hours)",
y = "Sleep Quality (1-10)"
)
These graphs were generated to examine the relationship between Sleep Quality and the three predictors defined in Q2: Physical Activity Level, Stress Level, and BMI Category. Within the dataset, physical activity was first recoded into categorical groups. Additionally, BMI categories were standardized by merging ‘Normal Weight’ and ‘Normal’ into a single ‘Normal’ group, organizing the data into three categories: Normal, Overweight, and Obese. Subsequently, three summary datasets were created by calculating the average sleep quality for each variable level, and these were used to produce the bar charts. This visualization facilitates a clear comparison of average values across the levels of each predictor variable. Furthermore, it intuitively demonstrates the patterns connecting behavioral factors to sleep quality.
sleep_2 <- sleep_renamed %>%
mutate(
physical_activity_group = cut(
physical_activity,
breaks = c(-Inf, 40, 60, 80, Inf),
labels = c("~ 40", "41 ~ 60", "61 ~ 80", "81 ~"),
right = TRUE),
physical_activity_group = factor(physical_activity_group, ordered = TRUE),
stress_level = factor(stress_level, ordered = TRUE),
bmi_category = case_when(
bmi_category %in% c("Normal", "Normal Weight") ~ "Normal", TRUE ~ bmi_category),
bmi_category = factor(bmi_category,
levels = c("Normal", "Overweight", "Obese"))
)
p_a_summary <- sleep_2 %>%
group_by(physical_activity_group) %>%
summarise(mean_sleep_quality = mean(sleep_quality, na.rm = TRUE)) %>%
mutate(
predictor = "Physical Activity",
level = physical_activity_group
) %>%
select(predictor, level, mean_sleep_quality)
s_l_summary <- sleep_2 %>%
group_by(stress_level) %>%
summarise(mean_sleep_quality = mean(sleep_quality, na.rm = TRUE)) %>%
mutate(
predictor = "Stress Level",
level = stress_level
) %>%
select(predictor, level, mean_sleep_quality)
bmi_summary <- sleep_2 %>%
group_by(bmi_category) %>%
summarise(mean_sleep_quality = mean(sleep_quality, na.rm = TRUE)) %>%
mutate(
predictor = "BMI Group",
level = bmi_category
) %>%
select(predictor, level, mean_sleep_quality)
p_q2a <-ggplot(data = p_a_summary,
mapping = aes(x = level, y = mean_sleep_quality))
p_q2a +
geom_col(fill = "skyblue", width = 0.7) +
coord_cartesian(ylim = c(5.5, 8.5)) +
labs( title = "Q2-A. Effect of Physical Activity on Sleep Quality",
x = "Physical Activity Category",
y = "Mean Sleep Quality"
) +
geom_text(
aes(label = round(mean_sleep_quality, 2))
)
p_q2b <- ggplot(data = s_l_summary,
mapping = aes(x = level, y = mean_sleep_quality))
p_q2b +
geom_col(fill = "pink", width = 0.7) +
coord_cartesian(ylim = c(5.5, 8.5)) +
labs( title = "Q2-B. Effect of Stress Level on Sleep Quality",
x = "Stress Level",
y = "Mean Sleep Quality"
) +
geom_text(
aes(label = round(mean_sleep_quality, 2))
)
p_q2c <- ggplot(data =bmi_summary,
mapping = aes(x = level, y = mean_sleep_quality))
p_q2c +
geom_col(fill = "orange", width = 0.7) +
coord_cartesian(ylim = c(5.5, 8.5)) +
labs( title = "Q2-C. Effect of BMI Category on Sleep Quality",
x = "BMI Category",
y = "Mean Sleep Quality"
) +
geom_text(
aes(label = round(mean_sleep_quality, 2))
)
This line graph was generated to visualize the interaction between Physical Activity Level and Stress Level in relation to Sleep Quality. The differences in slope illustrate the rate of decline in sleep quality as stress increases, highlighting whether this negative effect is mitigated within the high physical activity group. This visualization presents the shifts in sleep quality across stress levels as a continuous pattern, facilitating a clear comparison of the trajectories defined by physical activity levels.
q3_data <- sleep_2 %>%
group_by(stress_level, physical_activity_group) %>%
summarise(
mean_sleep_quality = mean(sleep_quality, na.rm = TRUE),
n = n(),
.groups = "drop"
)
q3 <- ggplot(data = q3_data,
mapping = aes(
x = stress_level,
y = mean_sleep_quality,
group = physical_activity_group,
color = physical_activity_group))
q3 +
geom_line(size = 1.2) +
scale_color_manual(
values = c("#d2e5f3", "#94b7d6", "#005b96", "#03396c")
) +
labs(
title = "Q3. Effect of Physical Activity × Stress on Sleep Quality",
x = "Stress Level",
y = "Mean Sleep Quality",
color = "Physical Activity"
) +
theme_minimal(base_size = 14)