This project analyzes the relationship between sleep health and lifestyle factors using a dataset of 374 individuals. It includes variables such as sleep duration, physical activity, stress, BMI, gender, and sleep disorders.
Project Objectives
1.Examine the relationship between sleep duration and sleep quality to understand how hours slept affect perceived restfulness.
2.Investigate the influence of physical activity on sleep patterns, including sleep quality and duration.
3.Analyze the impact of stress levels on sleep health to identify how stress correlates with sleep disturbances.
4.Explore differences in sleep health across occupations, highlighting groups at higher risk of poor sleep.
5.Identify associations between BMI categories and sleep disorders, such as apnea or insomnia.
6.Compare sleep health metrics between genders to detect any significant differences.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(readr)
library(ggpubr)
library(corrplot)
## corrplot 0.95 loaded
# Load the dataset
sleep_data <- read.csv("C:/Users/shiva/Downloads/Sleep_health_and_lifestyle_dataset.csv")
# Explore structure and summary
str(sleep_data)
## 'data.frame': 374 obs. of 13 variables:
## $ Person.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Age : int 27 28 28 28 28 28 29 29 29 29 ...
## $ Occupation : chr "Software Engineer" "Doctor" "Doctor" "Sales Representative" ...
## $ Sleep.Duration : num 6.1 6.2 6.2 5.9 5.9 5.9 6.3 7.8 7.8 7.8 ...
## $ Quality.of.Sleep : int 6 6 6 4 4 4 6 7 7 7 ...
## $ Physical.Activity.Level: int 42 60 60 30 30 30 40 75 75 75 ...
## $ Stress.Level : int 6 8 8 8 8 8 7 6 6 6 ...
## $ BMI.Category : chr "Overweight" "Normal" "Normal" "Obese" ...
## $ Blood.Pressure : chr "126/83" "125/80" "125/80" "140/90" ...
## $ Heart.Rate : int 77 75 75 85 85 85 82 70 70 70 ...
## $ Daily.Steps : int 4200 10000 10000 3000 3000 3000 3500 8000 8000 8000 ...
## $ Sleep.Disorder : chr "None" "None" "None" "Sleep Apnea" ...
summary(sleep_data)
## Person.ID Gender Age Occupation
## Min. : 1.00 Length:374 Min. :27.00 Length:374
## 1st Qu.: 94.25 Class :character 1st Qu.:35.25 Class :character
## Median :187.50 Mode :character Median :43.00 Mode :character
## Mean :187.50 Mean :42.18
## 3rd Qu.:280.75 3rd Qu.:50.00
## Max. :374.00 Max. :59.00
## Sleep.Duration Quality.of.Sleep Physical.Activity.Level Stress.Level
## Min. :5.800 Min. :4.000 Min. :30.00 Min. :3.000
## 1st Qu.:6.400 1st Qu.:6.000 1st Qu.:45.00 1st Qu.:4.000
## Median :7.200 Median :7.000 Median :60.00 Median :5.000
## Mean :7.132 Mean :7.313 Mean :59.17 Mean :5.385
## 3rd Qu.:7.800 3rd Qu.:8.000 3rd Qu.:75.00 3rd Qu.:7.000
## Max. :8.500 Max. :9.000 Max. :90.00 Max. :8.000
## BMI.Category Blood.Pressure Heart.Rate Daily.Steps
## Length:374 Length:374 Min. :65.00 Min. : 3000
## Class :character Class :character 1st Qu.:68.00 1st Qu.: 5600
## Mode :character Mode :character Median :70.00 Median : 7000
## Mean :70.17 Mean : 6817
## 3rd Qu.:72.00 3rd Qu.: 8000
## Max. :86.00 Max. :10000
## Sleep.Disorder
## Length:374
## Class :character
## Mode :character
##
##
##
sum(is.na(sleep_data))
## [1] 0
# Convert categorical variables to factors
sleep_data <- sleep_data %>%
mutate(
Gender = as.factor(Gender),
Occupation = as.factor(Occupation),
BMI.Category = as.factor(BMI.Category),
Sleep.Disorder = as.factor(Sleep.Disorder)
)
# Split Blood Pressure column into Systolic and Diastolic
# Check actual names first
colnames(sleep_data)
## [1] "Person.ID" "Gender"
## [3] "Age" "Occupation"
## [5] "Sleep.Duration" "Quality.of.Sleep"
## [7] "Physical.Activity.Level" "Stress.Level"
## [9] "BMI.Category" "Blood.Pressure"
## [11] "Heart.Rate" "Daily.Steps"
## [13] "Sleep.Disorder"
# Then use correct name in separate()
# Check actual names first
colnames(sleep_data)
## [1] "Person.ID" "Gender"
## [3] "Age" "Occupation"
## [5] "Sleep.Duration" "Quality.of.Sleep"
## [7] "Physical.Activity.Level" "Stress.Level"
## [9] "BMI.Category" "Blood.Pressure"
## [11] "Heart.Rate" "Daily.Steps"
## [13] "Sleep.Disorder"
# Then use correct name in separate()
# Create Sleep Quality Category
sleep_data <- sleep_data %>%
mutate(Sleep_Quality_Category = case_when(
Quality.of.Sleep >= 8 ~ "Excellent",
Quality.of.Sleep >= 6 ~ "Good",
Quality.of.Sleep >= 4 ~ "Fair",
TRUE ~ "Poor"
),
Sleep_Quality_Category = factor(Sleep_Quality_Category,
levels = c("Poor", "Fair", "Good", "Excellent")))
ggplot(sleep_data, aes(x = Sleep.Duration, y = Quality.of.Sleep)) +
geom_point(aes(color = Gender), alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Sleep Duration vs Quality", x = "Sleep Duration (hours)", y = "Sleep Quality") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(sleep_data, aes(x = Physical.Activity.Level, y = Quality.of.Sleep)) +
geom_point(aes(color = BMI.Category), alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Physical Activity vs Sleep Quality") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(sleep_data, aes(x = factor(Stress.Level), y = Sleep.Duration)) +
geom_violin(aes(fill = factor(Stress.Level))) +
labs(title = "Sleep Duration by Stress Level", x = "Stress Level", y = "Sleep Duration") +
theme_minimal()
disorder_counts <- sleep_data %>%
filter(Sleep.Disorder != "None") %>%
count(BMI.Category, Sleep.Disorder)
ggplot(disorder_counts, aes(x = BMI.Category, y = n, fill = Sleep.Disorder)) +
geom_bar(stat = "identity") +
labs(title = "Sleep Disorders by BMI Category", x = "BMI Category", y = "Count") +
theme_minimal()
# Select numeric columns for correlation (excluding Systolic_BP and Diastolic_BP)
numeric_data <- sleep_data %>%
select(Sleep.Duration, Quality.of.Sleep, Physical.Activity.Level,
Stress.Level, Heart.Rate, Daily.Steps)
# Calculate correlation matrix
cor_matrix <- cor(numeric_data, use = "complete.obs")
# Plot correlation matrix
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.srt = 45,
title = "Correlation Matrix of Sleep and Lifestyle Factors",
mar = c(0, 0, 2, 0))
Summary
This project explored how various lifestyle and demographic factors influence sleep health among individuals. Using a dataset of 374 observations, we conducted visual and statistical analyses to uncover patterns and relationships.
Key findings include:
Sleep Duration and Quality: A strong positive relationship was observed — individuals sleeping 7–9 hours generally reported higher sleep quality scores.
Physical Activity: Increased physical activity was linked to better sleep quality and slightly longer sleep durations.
Stress Levels: Higher stress levels correlated negatively with both sleep quality and duration, highlighting stress as a key factor in sleep disturbances.
Occupational Differences: Certain professions, such as nurses and teachers, showed reduced average sleep duration, while engineers and accountants reported better sleep quality.
BMI and Sleep Disorders: Overweight and obese individuals had higher rates of sleep disorders, particularly sleep apnea and insomnia.
Gender Differences: Females reported slightly higher sleep quality, while males engaged in more physical activity on average. Sleep duration differences were minimal.
Overall, this analysis emphasizes the importance of a balanced lifestyle—including physical activity, stress management, and weight control—in promoting healthier sleep patterns.