Introduction

This project analyzes the relationship between sleep health and lifestyle factors using a dataset of 374 individuals. It includes variables such as sleep duration, physical activity, stress, BMI, gender, and sleep disorders.

Project Objectives

1.Examine the relationship between sleep duration and sleep quality to understand how hours slept affect perceived restfulness.

2.Investigate the influence of physical activity on sleep patterns, including sleep quality and duration.

3.Analyze the impact of stress levels on sleep health to identify how stress correlates with sleep disturbances.

4.Explore differences in sleep health across occupations, highlighting groups at higher risk of poor sleep.

5.Identify associations between BMI categories and sleep disorders, such as apnea or insomnia.

6.Compare sleep health metrics between genders to detect any significant differences.

Load Libraries and Data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(readr)
library(ggpubr)
library(corrplot)
## corrplot 0.95 loaded
# Load the dataset
sleep_data <- read.csv("C:/Users/shiva/Downloads/Sleep_health_and_lifestyle_dataset.csv")

# Explore structure and summary
str(sleep_data)
## 'data.frame':    374 obs. of  13 variables:
##  $ Person.ID              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender                 : chr  "Male" "Male" "Male" "Male" ...
##  $ Age                    : int  27 28 28 28 28 28 29 29 29 29 ...
##  $ Occupation             : chr  "Software Engineer" "Doctor" "Doctor" "Sales Representative" ...
##  $ Sleep.Duration         : num  6.1 6.2 6.2 5.9 5.9 5.9 6.3 7.8 7.8 7.8 ...
##  $ Quality.of.Sleep       : int  6 6 6 4 4 4 6 7 7 7 ...
##  $ Physical.Activity.Level: int  42 60 60 30 30 30 40 75 75 75 ...
##  $ Stress.Level           : int  6 8 8 8 8 8 7 6 6 6 ...
##  $ BMI.Category           : chr  "Overweight" "Normal" "Normal" "Obese" ...
##  $ Blood.Pressure         : chr  "126/83" "125/80" "125/80" "140/90" ...
##  $ Heart.Rate             : int  77 75 75 85 85 85 82 70 70 70 ...
##  $ Daily.Steps            : int  4200 10000 10000 3000 3000 3000 3500 8000 8000 8000 ...
##  $ Sleep.Disorder         : chr  "None" "None" "None" "Sleep Apnea" ...
summary(sleep_data)
##    Person.ID         Gender               Age         Occupation       
##  Min.   :  1.00   Length:374         Min.   :27.00   Length:374        
##  1st Qu.: 94.25   Class :character   1st Qu.:35.25   Class :character  
##  Median :187.50   Mode  :character   Median :43.00   Mode  :character  
##  Mean   :187.50                      Mean   :42.18                     
##  3rd Qu.:280.75                      3rd Qu.:50.00                     
##  Max.   :374.00                      Max.   :59.00                     
##  Sleep.Duration  Quality.of.Sleep Physical.Activity.Level  Stress.Level  
##  Min.   :5.800   Min.   :4.000    Min.   :30.00           Min.   :3.000  
##  1st Qu.:6.400   1st Qu.:6.000    1st Qu.:45.00           1st Qu.:4.000  
##  Median :7.200   Median :7.000    Median :60.00           Median :5.000  
##  Mean   :7.132   Mean   :7.313    Mean   :59.17           Mean   :5.385  
##  3rd Qu.:7.800   3rd Qu.:8.000    3rd Qu.:75.00           3rd Qu.:7.000  
##  Max.   :8.500   Max.   :9.000    Max.   :90.00           Max.   :8.000  
##  BMI.Category       Blood.Pressure       Heart.Rate     Daily.Steps   
##  Length:374         Length:374         Min.   :65.00   Min.   : 3000  
##  Class :character   Class :character   1st Qu.:68.00   1st Qu.: 5600  
##  Mode  :character   Mode  :character   Median :70.00   Median : 7000  
##                                        Mean   :70.17   Mean   : 6817  
##                                        3rd Qu.:72.00   3rd Qu.: 8000  
##                                        Max.   :86.00   Max.   :10000  
##  Sleep.Disorder    
##  Length:374        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
sum(is.na(sleep_data))
## [1] 0

Data Cleaning and Transformation

# Convert categorical variables to factors
sleep_data <- sleep_data %>%
  mutate(
    Gender = as.factor(Gender),
    Occupation = as.factor(Occupation),
    BMI.Category = as.factor(BMI.Category),
    Sleep.Disorder = as.factor(Sleep.Disorder)
  )

# Split Blood Pressure column into Systolic and Diastolic
# Check actual names first
colnames(sleep_data)
##  [1] "Person.ID"               "Gender"                 
##  [3] "Age"                     "Occupation"             
##  [5] "Sleep.Duration"          "Quality.of.Sleep"       
##  [7] "Physical.Activity.Level" "Stress.Level"           
##  [9] "BMI.Category"            "Blood.Pressure"         
## [11] "Heart.Rate"              "Daily.Steps"            
## [13] "Sleep.Disorder"
# Then use correct name in separate()
# Check actual names first
colnames(sleep_data)
##  [1] "Person.ID"               "Gender"                 
##  [3] "Age"                     "Occupation"             
##  [5] "Sleep.Duration"          "Quality.of.Sleep"       
##  [7] "Physical.Activity.Level" "Stress.Level"           
##  [9] "BMI.Category"            "Blood.Pressure"         
## [11] "Heart.Rate"              "Daily.Steps"            
## [13] "Sleep.Disorder"
# Then use correct name in separate()




# Create Sleep Quality Category
sleep_data <- sleep_data %>%
  mutate(Sleep_Quality_Category = case_when(
    Quality.of.Sleep >= 8 ~ "Excellent",
    Quality.of.Sleep >= 6 ~ "Good",
    Quality.of.Sleep >= 4 ~ "Fair",
    TRUE ~ "Poor"
  ),
  Sleep_Quality_Category = factor(Sleep_Quality_Category,
                                  levels = c("Poor", "Fair", "Good", "Excellent")))

Exploratory Data Analysis

Sleep Duration vs Quality

ggplot(sleep_data, aes(x = Sleep.Duration, y = Quality.of.Sleep)) +
  geom_point(aes(color = Gender), alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Sleep Duration vs Quality", x = "Sleep Duration (hours)", y = "Sleep Quality") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Physical Activity vs Sleep

ggplot(sleep_data, aes(x = Physical.Activity.Level, y = Quality.of.Sleep)) +
  geom_point(aes(color = BMI.Category), alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Physical Activity vs Sleep Quality") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Stress Levels and Sleep

ggplot(sleep_data, aes(x = factor(Stress.Level), y = Sleep.Duration)) +
  geom_violin(aes(fill = factor(Stress.Level))) +
  labs(title = "Sleep Duration by Stress Level", x = "Stress Level", y = "Sleep Duration") +
  theme_minimal()

Sleep Disorders by BMI

disorder_counts <- sleep_data %>%
  filter(Sleep.Disorder != "None") %>%
  count(BMI.Category, Sleep.Disorder)

ggplot(disorder_counts, aes(x = BMI.Category, y = n, fill = Sleep.Disorder)) +
  geom_bar(stat = "identity") +
  labs(title = "Sleep Disorders by BMI Category", x = "BMI Category", y = "Count") +
  theme_minimal()

Correlation Analysis

# Select numeric columns for correlation (excluding Systolic_BP and Diastolic_BP)
numeric_data <- sleep_data %>%
  select(Sleep.Duration, Quality.of.Sleep, Physical.Activity.Level,
         Stress.Level, Heart.Rate, Daily.Steps)

# Calculate correlation matrix
cor_matrix <- cor(numeric_data, use = "complete.obs")

# Plot correlation matrix
corrplot(cor_matrix, method = "color", type = "upper",
         tl.col = "black", tl.srt = 45,
         title = "Correlation Matrix of Sleep and Lifestyle Factors",
         mar = c(0, 0, 2, 0))

Conclusion

Summary

This project explored how various lifestyle and demographic factors influence sleep health among individuals. Using a dataset of 374 observations, we conducted visual and statistical analyses to uncover patterns and relationships.

Key findings include:

Sleep Duration and Quality: A strong positive relationship was observed — individuals sleeping 7–9 hours generally reported higher sleep quality scores.

Physical Activity: Increased physical activity was linked to better sleep quality and slightly longer sleep durations.

Stress Levels: Higher stress levels correlated negatively with both sleep quality and duration, highlighting stress as a key factor in sleep disturbances.

Occupational Differences: Certain professions, such as nurses and teachers, showed reduced average sleep duration, while engineers and accountants reported better sleep quality.

BMI and Sleep Disorders: Overweight and obese individuals had higher rates of sleep disorders, particularly sleep apnea and insomnia.

Gender Differences: Females reported slightly higher sleep quality, while males engaged in more physical activity on average. Sleep duration differences were minimal.

Overall, this analysis emphasizes the importance of a balanced lifestyle—including physical activity, stress management, and weight control—in promoting healthier sleep patterns.