Executive summary

The Sleep Health and Lifestyle Dataset shows that adult sleep quality is not determined only by sleep duration. Instead, it seems to be shaped by several lifestyle factors working together, including physical activity, stress, and BMI.

Q1 showed that sleep duration alone cannot fully explain sleep quality.
Even when people slept for the “optimal” 6.5 ~ 8 hours, their sleep quality still ranged widely from 5 to 8 points. Likewise, some people with short sleep durations (~ 6.5 hours) still reported fairly good sleep (6-7 points). This means that duration by itself is not the key factor behind good sleep.
Q2 showed how multiple lifestyle factors individually influence sleep quality.
People with moderate–high physical activity (61 ~ 80) had the highest sleep quality, while very low activity was linked to clearly poorer sleep. Stress showed the reverse trend: sleep quality dropped sharply as stress increased, from above 8 at low stress (level 3) to the mid-5 range at high stress (levels 7 ~ 8). BMI showed a similar pattern, where people with a Normal BMI slept best, and sleep quality declined in the Overweight and Obese groups.
Q3 revealed the most important pattern in this dataset: physical activity appears to buffer the negative effect of stress on sleep.
Although sleep quality generally worsens as stress increases, it remained relatively stable among people with high physical activity. When stress was low, physical activity did not matter very much, but as stress increased, activity level became crucial for explaining differences in sleep quality. In particular, moderate–high physical activity (61~ 80) showed the strongest ‘buffering effect’.

Overall, these findings suggest that lifestyle factors are interconnected rather than acting alone. Regular physical activity stands out as a particularly effective way to improve sleep. People who engage in moderate–high activity were less likely to experience high stress, and exercise seems to provide a “double benefit”: supporting sleep directly and reducing the negative impact of stress. While further studies using real-world data are needed, regular physical activity appears to be a more practical strategy for maintaining healthy sleep than relying only on stress control.

Data background

The Sleep Health and Lifestyle Dataset is a synthetic dataset created by a student, Laksika Tharmalingam at the University of Moratuwa for illustrative and educational purposes. Comprising 400 rows and 13 columns, the dataset covers a wide range of variables related to sleep and daily habits. The data variables can be categorized as follows: Sleep Metrics: Duration, quality, and patterns. Lifestyle Factors: Physical activity, stress, and BMI. Cardiovascular Health: Blood pressure and heart rate. Sleep Disorders: Presence of Insomnia and Sleep Apnea.

Data cleaning

The original dataset was loaded using read_csv(), followed by an examination of the overall variable structure. To streamline the analysis, the rename() function was applied to standardize variable names into a concise format. Subsequently, select() was used to extract only the sleep metrics and major lifestyle variables. This process simplified the data structure, facilitating efficient visualization and analysis.

Load raw data

sleep_raw <- read_csv("Sleep_health_and_lifestyle_dataset.csv")

Clean and rename variables

sleep_renamed <- sleep_raw %>%
  rename(
    sleep_duration   = `Sleep Duration`,
    sleep_quality    = `Quality of Sleep`,
    physical_activity = `Physical Activity Level`,
    stress_level     = `Stress Level`,
    bmi_category     = `BMI Category`
  ) %>%
  select(
    sleep_duration,
    sleep_quality,
    physical_activity,
    stress_level,
    bmi_category
  )

Individual figures

Figure 1: Scatterplot

To examine the correlation between sleep duration and sleep quality, sleep duration was categorized into ‘Short,’ ‘Medium,’ and ‘Long’ intervals, and three corresponding scatter plots were generated. This categorization reflects the general consensus that 7~8 hours represents the ‘optimal sleep duration’, allowing for verification of whether this range correlates with peak sleep quality in the actual data. Particular attention was paid to cases where individuals experienced low quality despite adequate sleep, or maintained similar quality despite short sleep duration. These observations highlighted the need to investigate the potential influence of other lifestyle factors beyond sleep duration alone.

Q1. Does longer sleep duration necessarily lead to higher sleep quality?

sleep_1 <- sleep_renamed %>%
  mutate(
    duration_group = ifelse(sleep_duration < 6.5, "Short (~ 6.5h)",
                     ifelse(sleep_duration <= 8, "Medium (6.5 ~ 8h)",
                                         "Long (8h ~ )"))
  )

p_q1 <- ggplot(data=sleep_1,
       mapping = aes(x = sleep_duration, y = sleep_quality)) 

p_q1 + 
  geom_point(alpha = 0.3, size = 1.5) +
  geom_smooth(method = "loess", se = FALSE) +
  facet_wrap(~ duration_group, ncol = 1) +
  labs(
    title = "Q1. Effect of Sleep Duration on Sleep Quality",
    x = "Sleep Duration (hours)",
    y = "Sleep Quality (1-10)"
  )

Figure 2: Bar graphs by variable

These graphs were generated to examine the relationship between Sleep Quality and the three predictors defined in Q2: Physical Activity Level, Stress Level, and BMI Category. Within the dataset, physical activity was first recoded into categorical groups. Additionally, BMI categories were standardized by merging ‘Normal Weight’ and ‘Normal’ into a single ‘Normal’ group, organizing the data into three categories: Normal, Overweight, and Obese. Subsequently, three summary datasets were created by calculating the average sleep quality for each variable level, and these were used to produce the bar charts. This visualization facilitates a clear comparison of average values across the levels of each predictor variable. Furthermore, it intuitively demonstrates the patterns connecting behavioral factors to sleep quality.

Q2. How do behavioral factors (physical activity, stress level, BMI) individually relate to sleep quality?

sleep_2 <- sleep_renamed %>%
  mutate(
    physical_activity_group = cut(
      physical_activity,
      breaks = c(-Inf, 40, 60, 80, Inf),
      labels = c("~ 40", "41 ~ 60", "61 ~ 80", "81 ~"),
      right = TRUE),
    physical_activity_group = factor(physical_activity_group, ordered = TRUE),
    
    stress_level = factor(stress_level, ordered = TRUE),

    bmi_category = case_when(
      bmi_category %in% c("Normal", "Normal Weight") ~ "Normal", TRUE ~ bmi_category),
    bmi_category = factor(bmi_category,
                          levels = c("Normal", "Overweight", "Obese"))
  )

p_a_summary <- sleep_2 %>%
  group_by(physical_activity_group) %>%
  summarise(mean_sleep_quality = mean(sleep_quality, na.rm = TRUE)) %>%
  mutate(
    predictor = "Physical Activity",
    level = physical_activity_group
  ) %>%
  select(predictor, level, mean_sleep_quality)

s_l_summary <- sleep_2 %>%
  group_by(stress_level) %>%
  summarise(mean_sleep_quality = mean(sleep_quality, na.rm = TRUE)) %>%
  mutate(
    predictor = "Stress Level",
    level = stress_level
  ) %>%
  select(predictor, level, mean_sleep_quality)

bmi_summary <- sleep_2 %>%
  group_by(bmi_category) %>%
  summarise(mean_sleep_quality = mean(sleep_quality, na.rm = TRUE)) %>%
  mutate(
    predictor = "BMI Group",
    level = bmi_category
  ) %>%
  select(predictor, level, mean_sleep_quality)
Q2-A. Physical Activity
p_q2a <-ggplot(data = p_a_summary,
  mapping = aes(x = level, y = mean_sleep_quality)) 

p_q2a + 
  geom_col(fill = "skyblue", width = 0.7) +
  coord_cartesian(ylim = c(5.5, 8.5)) +
  labs( title = "Q2-A. Effect of Physical Activity on Sleep Quality",
        x = "Physical Activity Category",
        y = "Mean Sleep Quality"
  ) +
  geom_text(
    aes(label = round(mean_sleep_quality, 2))
  )

Q2-B. Stress Level
p_q2b <- ggplot(data = s_l_summary,
  mapping = aes(x = level, y = mean_sleep_quality))

p_q2b +
  geom_col(fill = "pink", width = 0.7) +
  coord_cartesian(ylim = c(5.5, 8.5)) +
  labs( title = "Q2-B. Effect of Stress Level on Sleep Quality",
        x = "Stress Level",
        y = "Mean Sleep Quality"
  ) +
  geom_text(
    aes(label = round(mean_sleep_quality, 2))
  )

Q2-C. BMI Category
p_q2c <- ggplot(data =bmi_summary,
  mapping = aes(x = level, y = mean_sleep_quality))

p_q2c +
  geom_col(fill = "orange", width = 0.7) +
  coord_cartesian(ylim = c(5.5, 8.5)) +
  labs( title = "Q2-C. Effect of BMI Category on Sleep Quality",
        x = "BMI Category",
        y = "Mean Sleep Quality"
  ) +
  geom_text(
    aes(label = round(mean_sleep_quality, 2))
  )

Figure 3: Line graph

This line graph was generated to visualize the interaction between Physical Activity Level and Stress Level in relation to Sleep Quality. The differences in slope illustrate the rate of decline in sleep quality as stress increases, highlighting whether this negative effect is mitigated within the high physical activity group. This visualization presents the shifts in sleep quality across stress levels as a continuous pattern, facilitating a clear comparison of the trajectories defined by physical activity levels.

Q3. How do physical activity and stress level jointly affect sleep quality?
q3_data <- sleep_2 %>%
  group_by(stress_level, physical_activity_group) %>%
  summarise(
    mean_sleep_quality = mean(sleep_quality, na.rm = TRUE),
    n = n(),
    .groups = "drop"
  )

q3 <- ggplot(data = q3_data,
       mapping = aes(
         x = stress_level,
         y = mean_sleep_quality,
         group = physical_activity_group,
         color = physical_activity_group))

q3 +
  geom_line(size = 1.2) +
  scale_color_manual(
    values = c("#d2e5f3", "#94b7d6", "#005b96", "#03396c")
  ) + 
  labs(
    title = "Q3. Effect of Physical Activity × Stress on Sleep Quality",
    x = "Stress Level",
    y = "Mean Sleep Quality",
    color = "Physical Activity"
  ) +
  theme_minimal(base_size = 14)