Introduction

The dataset I used for this project is called Sleep Health and Lifestyle Dataset.
It contains data for 374 individuals, including information about their age, gender, occupation, physical activity, stress level, and sleep patterns.

The goal of this analysis is to explore how occupation affects sleep health — looking at how different types of jobs relate to sleep duration, quality, and stress.

In this report, I calculated some summary statistics and created five different visualizations to show patterns in the data. Each visualization highlights a unique aspect of the relationship between occupation and sleep.

Loading and Exploring the Data

# Prefer same-folder CSV; fallback to common Apporto path
csv_candidates <- c(
  "Sleep_health_and_lifestyle_dataset.csv",
  "~/Documents/R/library/Sleep_health_and_lifestyle_dataset.csv"
)
csv_path <- csv_candidates[file.exists(csv_candidates)][1]
stopifnot(!is.na(csv_path))

df <- read.csv(csv_path, check.names = FALSE)
names(df) <- make.names(names(df))
df$Sleep.Disorder <- ifelse(is.na(df$Sleep.Disorder) | df$Sleep.Disorder == "", "None", df$Sleep.Disorder)

df <- df %>%
  filter(!is.na(Occupation),
         !is.na(Sleep.Duration),
         !is.na(Quality.of.Sleep),
         !is.na(Stress.Level))

knitr::kable(head(df[, c("Occupation","Sleep.Duration","Quality.of.Sleep","Stress.Level","Physical.Activity.Level")], 8))
Occupation Sleep.Duration Quality.of.Sleep Stress.Level Physical.Activity.Level
Software Engineer 6.1 6 6 42
Doctor 6.2 6 8 60
Doctor 6.2 6 8 60
Sales Representative 5.9 4 8 30
Sales Representative 5.9 4 8 30
Software Engineer 5.9 4 8 30
Teacher 6.3 6 7 40
Doctor 7.8 7 6 75

From this, we can see the dataset includes a mix of numeric and categorical variables such as Sleep.Duration, Quality.of.Sleep, Stress.Level, Physical.Activity.Level, and Occupation. There are also health indicators like Heart.Rate, BMI.Category, and Sleep.Disorder, which help explain lifestyle differences between groups.

Descriptive Statistics

stats_df <- df %>%
  summarise(
    Sleep.Duration_mean = mean(Sleep.Duration),
    Sleep.Duration_sd   = sd(Sleep.Duration),
    Quality_mean        = mean(Quality.of.Sleep),
    Quality_sd          = sd(Quality.of.Sleep),
    Stress_mean         = mean(Stress.Level),
    Stress_sd           = sd(Stress.Level),
    Activity_mean       = mean(Physical.Activity.Level, na.rm = TRUE),
    Steps_mean          = mean(Daily.Steps, na.rm = TRUE),
    HeartRate_mean      = mean(Heart.Rate, na.rm = TRUE)
  )
knitr::kable(stats_df, digits = 2)
Sleep.Duration_mean Sleep.Duration_sd Quality_mean Quality_sd Stress_mean Stress_sd Activity_mean Steps_mean HeartRate_mean
7.13 0.8 7.31 1.2 5.39 1.77 59.17 6816.84 70.17

On average, people in this dataset sleep between 6–7 hours per night with a sleep quality rating around 6–7 out of 10. Stress levels vary more widely, ranging from 1 to 10. The dataset captures a balanced mix of different activity levels and job types.

Visualization 1: Sleep Duration by Occupation

df_filtered <- df %>% group_by(Occupation) %>% filter(n() >= min_n)

ggplot(df_filtered, aes(x = fct_reorder(Occupation, Sleep.Duration, median, .desc = TRUE),
                        y = Sleep.Duration)) +
  geom_violin(fill = "#7fcdbb", alpha = 0.8) +
  geom_boxplot(width = 0.15, outlier.shape = 21, fill = "white") +
  labs(title = "Sleep Duration by Occupation",
       x = "Occupation",
       y = "Sleep Duration (hours)") +
  coord_flip() +
  theme_minimal()

Interpretation:
This chart shows the distribution of how long people sleep by occupation. The wider the violin, the more variation there is in sleep time. Some jobs, like doctors or sales representatives, show shorter and less consistent sleep, while engineers and managers tend to have steadier, longer sleep patterns.

Visualization 2: Sleep Disorder Prevalence by Occupation

df %>%
  group_by(Occupation, Sleep.Disorder) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(Occupation) %>%
  mutate(pct = count / sum(count)) %>%
  ungroup() %>%
  ggplot(aes(x = fct_reorder(Occupation, pct, max), y = pct, fill = Sleep.Disorder)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Sleep Disorder Prevalence by Occupation",
       x = "Occupation", y = "Percent within Occupation") +
  theme_minimal()

Interpretation:
This bar chart shows what percentage of people in each occupation report no sleep disorder, insomnia, or sleep apnea. Healthcare and sales jobs show a slightly higher rate of disorders compared to more regular-schedule jobs like engineering or office work. This suggests that stressful or irregular work schedules might be linked to poor sleep health.

Visualization 3: Average Stress Level by Occupation

df %>%
  group_by(Occupation) %>%
  summarise(Average_Stress = mean(Stress.Level), n = n(), .groups = "drop") %>%
  filter(n >= min_n) %>%
  arrange(Average_Stress) %>%
  ggplot(aes(x = Average_Stress, y = fct_reorder(Occupation, Average_Stress))) +
  geom_segment(aes(xend = 0, yend = Occupation), color = "gray80") +
  geom_point(size = 3, color = "#2c7fb8") +
  labs(title = "Average Stress Level by Occupation",
       x = "Average Stress Level (1–10)", y = "Occupation") +
  theme_minimal()

Interpretation:
This chart shows which occupations experience higher stress. Professions like doctors and sales representatives tend to have higher average stress levels, while jobs like teachers or engineers have lower stress. This difference may partly explain why sleep disorders and short sleep durations are more common in high-stress professions.

Visualization 4: Physical Activity vs Sleep Quality by Occupation

df %>%
  group_by(Occupation) %>%
  summarise(Activity = mean(Physical.Activity.Level),
            Quality  = mean(Quality.of.Sleep),
            Sleep    = mean(Sleep.Duration),
            n        = n(), .groups = "drop") %>%
  filter(n >= min_n) %>%
  ggplot(aes(x = Activity, y = Quality, size = n, color = Sleep)) +
  geom_point(alpha = 0.8) +
  scale_color_gradient(low = "#feb24c", high = "#2b8cbe") +
  labs(title = "Physical Activity vs. Sleep Quality by Occupation",
       x = "Average Physical Activity Level",
       y = "Average Sleep Quality",
       size = "Sample Size",
       color = "Avg Sleep (hours)") +
  theme_minimal()

Interpretation:
There’s a clear positive relationship — occupations with more active workers generally report higher sleep quality and longer average sleep durations. This supports the common idea that staying physically active improves overall sleep health.

Visualization 5: Heatmap of Sleep, Stress, and Activity by Occupation

metrics <- df %>%
  group_by(Occupation) %>%
  summarise(Sleep = mean(Sleep.Duration),
            Quality = mean(Quality.of.Sleep),
            Stress = mean(Stress.Level),
            Activity = mean(Physical.Activity.Level),
            Steps = mean(Daily.Steps),
            n = n(), .groups = "drop") %>%
  filter(n >= min_n) %>%
  tidyr::pivot_longer(cols = Sleep:Steps, names_to = "Metric", values_to = "Value")

ggplot(metrics, aes(x = Metric, y = fct_reorder(Occupation, Value), fill = Value)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "#fdd49e", high = "#31a354") +
  labs(title = "Heatmap of Sleep and Lifestyle Metrics by Occupation",
       x = "Metric", y = "Occupation", fill = "Average Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Interpretation:
The heatmap gives an overall snapshot of how each job ranks across several health metrics. Jobs that show darker green cells have higher averages, meaning better sleep or healthier activity levels. We can see patterns where high-activity jobs also score better on sleep quality, and high-stress jobs show the opposite.

Conclusion

From this analysis, it’s clear that occupation has a noticeable impact on sleep health. Jobs with high stress and irregular hours — such as healthcare and sales — show worse sleep patterns and more reported disorders. Meanwhile, jobs with more predictable schedules and moderate activity, such as engineering and management, are linked to better sleep duration and quality.

Key takeaways: - Physical activity appears to support better sleep outcomes.
- Stress management plays a major role in sleep health.
- Employers can help by promoting balanced work schedules and wellness programs.