2025-11-02

The Dataset

- This project uses the Sleep Health and Lifestyle dataset from Kaggle https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset.

The dataset contains 400 rows and 13 columns, examining relationships between sleep patterns, lifestyle factors, and health indicators.

Major variables include Sleep Duration, Sleep Quality, Stress Level, Physical Activity, BMI Category, Blood Pressure, Heart Rate, and Sleep Disorders.

Agenda

We will explore the dataset through various visualizations to better understand sleep health. The project layout:

Scatter Plot: Shows relationship between Age and Sleep Duration, colored by Gender.

Box Plot: Compares Sleep Quality across BMI Categories.

3D Plot: Visualizes Age, Sleep Duration, and Sleep Quality together, colored by Gender.

Statistical Analysis: Linear Regression predicting Sleep Quality, and ANOVA comparing Sleep Quality across Sleep Disorder groups.

GGPlot 1: Sleep Duration vs Age by Gender

This scatter plot shows the relationship between Age and Sleep Duration. Points are colored by Gender, and the linear trend lines show how sleep duration changes with age for each gender group.

GGPlot 2 — Sleep Quality by BMI Category

This boxplot compares Sleep Quality across BMI Categories. Each box shows the distribution of sleep quality within that category, helping us see how body weight may influence sleep.

Plotly 1: 3D Scatter Plot

This 3D scatter plot visualizes Age, Sleep Duration, and Sleep Quality together. Points are colored by Gender, showing complex patterns not visible in 2D plots.

3D plot Analysis

This 3D scatter plot explores the relationship between Age, Sleep Duration, and Quality of Sleep, with points colored by Gender.

We can see that:

Younger individuals tend to have longer sleep durations and higher sleep quality, shown by the clustering of higher points among lower age values.

As age increases, both sleep duration and sleep quality tend to decrease, though this trend varies slightly between genders.

Males and females show overlapping distributions, but there are slight differences—females appear to maintain somewhat higher sleep quality for a given sleep duration.

The 3D layout makes it easier to see how these variables interact together, showing that age and sleep duration jointly influence overall sleep quality.

Overall, this plot highlights that sleep quality declines with age and that sleep duration is a strong factor in maintaining good sleep health.

Plotly 2: Average Sleep Quality by Sleep Disorder

This interactive bar chart shows the average Sleep Quality for each Sleep Disorder group. It highlights how conditions like Insomnia or Sleep Apnea affect sleep quality compared to those without disorders.

Statistical Analysis: Linear Regression

This model predicts Sleep Quality using Sleep Duration, Stress Level, and Age. The results below show how each factor affects sleep quality.

Linear Regression Results: Predicting Sleep Quality
Variable Estimate Std. Error t-Value p-Value
(Intercept) 3.587 0.435 8.24 0
Sleep.Duration 0.677 0.045 15.13 0
Stress.Level -0.328 0.021 -15.76 0
Age 0.016 0.003 5.90 0

Sleep Duration (+0.677) → Every additional hour of sleep increases quality by about 0.68 points.

Stress Level (−0.328) → Each unit increase in stress lowers sleep quality by roughly 0.33 points.

Age (+0.016) → Older participants show a slight improvement in sleep quality (small but significant).

All predictors are statistically significant (p < 0.001).

Conclusion: Getting more sleep improves quality, while stress significantly harms it. Age plays a minor but positive role.

R Code: Linear Regression

model<-lm(Quality.of.Sleep~Sleep.Duration+Stress.Level+Age,data=sleep)

library(broom)
results<-tidy(model)


results
## # A tibble: 4 × 5
##   term           estimate std.error statistic  p.value
##   <chr>             <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)      3.59     0.435        8.24 2.98e-15
## 2 Sleep.Duration   0.677    0.0448      15.1  1.37e-40
## 3 Stress.Level    -0.328    0.0208     -15.8  3.38e-43
## 4 Age              0.0156   0.00265      5.90 7.99e- 9

Statistical Analysis: ANOVA - Sleep Quality by Sleep Disorder

This ANOVA test checks whether Sleep Quality significantly differs among groups with different Sleep Disorders (None, Insomnia, Sleep Apnea).
ANOVA Results: Sleep Quality by Sleep Disorder
Source Df Sum Sq Mean Sq F Value p-Value
Sleep.Disorder 2 69.21 34.61 27.6 0
Residuals 371 465.18 1.25 NA NA

The F-value tests whether mean Sleep Quality differs significantly between groups.

If p < 0.05, there is a statistically significant difference in sleep quality across Sleep Disorder categories.

Typically, people without a disorder report higher quality, while those with Insomnia or Sleep Apnea show lower scores.

Conclusion: Sleep disorders have a significant negative impact on overall sleep quality, confirming that better sleep health is strongly associated with the absence of disorders.

Conclusion

The analysis shows that Sleep Duration, Stress Level, and Age significantly affect Sleep Quality. BMI categories are associated with differences in sleep quality. Sleep disorders also reduce overall sleep quality. These insights highlight the role of lifestyle and health factors in maintaining healthy sleep patterns.