1. Introduction

This analysis explores the Sleep Health and Lifestyle Dataset, which contains information about 1,500 people — their age, gender, job, physical activity, stress levels, sleep habits, and health conditions like blood pressure and heart rate.

Our goal is to understand: What factors affect how well and how long people sleep?

2. Load Libraries

library(ggplot2)
library(corrplot)

3. Load and Explore the Data

sleep_data <- read.csv("expanded_sleep_health_dataset.csv")

# Basic look at the data
dim(sleep_data)        # rows and columns

## [1] 1500   13

str(sleep_data)        # data types

## 'data.frame':    1500 obs. of  13 variables:
##  $ Person.ID              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender                 : chr  "Male" "Male" "Female" "Male" ...
##  $ Age                    : int  41 42 45 24 18 78 41 21 50 68 ...
##  $ Occupation             : chr  "Nurse" "Software Engineer" "Doctor" "Writer" ...
##  $ Sleep.Duration         : num  7 6.7 8.7 8 8.8 7.7 6.8 6.7 7.2 7.5 ...
##  $ Quality.of.Sleep       : int  5 7 4 6 6 5 5 7 6 7 ...
##  $ Physical.Activity.Level: int  91 123 49 4 66 34 35 30 37 26 ...
##  $ Stress.Level           : int  10 5 7 5 4 6 9 6 7 6 ...
##  $ BMI.Category           : chr  "Normal" "Overweight" "Normal" "Normal" ...
##  $ Blood.Pressure         : chr  "111/71.15" "129/85.85000000000001" "119/79.35000000000001" "98/62.7" ...
##  $ Heart.Rate             : int  56 67 60 97 71 76 56 80 65 89 ...
##  $ Daily.Steps            : int  9476 10661 5033 1610 6574 6347 3866 4009 3605 2375 ...
##  $ Sleep.Disorder         : chr  "None" "None" "None" "None" ...

summary(sleep_data)    # quick stats for every column

##    Person.ID         Gender               Age         Occupation       
##  Min.   :   1.0   Length:1500        Min.   :18.00   Length:1500       
##  1st Qu.: 375.8   Class :character   1st Qu.:33.00   Class :character  
##  Median : 750.5   Mode  :character   Median :47.00   Mode  :character  
##  Mean   : 750.5                      Mean   :48.39                     
##  3rd Qu.:1125.2                      3rd Qu.:64.00                     
##  Max.   :1500.0                      Max.   :80.00                     
##  Sleep.Duration   Quality.of.Sleep Physical.Activity.Level  Stress.Level   
##  Min.   : 5.100   Min.   : 1.000   Min.   :  0.00          Min.   : 1.000  
##  1st Qu.: 7.200   1st Qu.: 5.000   1st Qu.: 30.00          1st Qu.: 4.000  
##  Median : 7.700   Median : 6.000   Median : 55.00          Median : 6.000  
##  Mean   : 7.752   Mean   : 5.825   Mean   : 59.19          Mean   : 6.012  
##  3rd Qu.: 8.400   3rd Qu.: 7.000   3rd Qu.: 83.25          3rd Qu.: 8.000  
##  Max.   :10.000   Max.   :10.000   Max.   :180.00          Max.   :10.000  
##  BMI.Category       Blood.Pressure       Heart.Rate      Daily.Steps   
##  Length:1500        Length:1500        Min.   : 43.00   Min.   : 1000  
##  Class :character   Class :character   1st Qu.: 66.00   1st Qu.: 3984  
##  Mode  :character   Mode  :character   Median : 75.00   Median : 5824  
##                                        Mean   : 74.76   Mean   : 6120  
##                                        3rd Qu.: 83.00   3rd Qu.: 8028  
##                                        Max.   :109.00   Max.   :16036  
##  Sleep.Disorder    
##  Length:1500       
##  Class :character  
##  Mode  :character  
##                    
##                    
##

# How many missing values?
cat("Missing values per column:\n")

## Missing values per column:

colSums(is.na(sleep_data))

##               Person.ID                  Gender                     Age 
##                       0                       0                       0 
##              Occupation          Sleep.Duration        Quality.of.Sleep 
##                       0                       0                       0 
## Physical.Activity.Level            Stress.Level            BMI.Category 
##                       0                       0                       0 
##          Blood.Pressure              Heart.Rate             Daily.Steps 
##                       0                       0                       0 
##          Sleep.Disorder 
##                       0

What this tells us: The dataset has 1,500 rows and 13 columns. The only column with missing values is Sleep Disorder — 961 people have no diagnosed disorder, so that field is blank for them.

4. Data Preprocessing

Before analysis, we clean and prepare the data.

# Split "Blood Pressure" (e.g. "120/80") into two separate number columns
bp_split <- strsplit(as.character(sleep_data$Blood.Pressure), "/")
sleep_data$Systolic  <- as.numeric(sapply(bp_split, "[", 1))
sleep_data$Diastolic <- as.numeric(sapply(bp_split, "[", 2))

# Fill in the blank Sleep Disorder entries with "No Disorder"
sleep_data$Sleep.Disorder[is.na(sleep_data$Sleep.Disorder)] <- "No Disorder"

# Tell R which columns are categories (not numbers)
sleep_data$Gender         <- as.factor(sleep_data$Gender)
sleep_data$Occupation     <- as.factor(sleep_data$Occupation)
sleep_data$BMI.Category   <- as.factor(sleep_data$BMI.Category)
sleep_data$Sleep.Disorder <- as.factor(sleep_data$Sleep.Disorder)

# Remove the original Blood Pressure text column (we now have Systolic and Diastolic)
sleep_data <- sleep_data[, !(names(sleep_data) %in% c("Blood.Pressure"))]

cat("Done! Final columns:\n")

## Done! Final columns:

colnames(sleep_data)

##  [1] "Person.ID"               "Gender"                 
##  [3] "Age"                     "Occupation"             
##  [5] "Sleep.Duration"          "Quality.of.Sleep"       
##  [7] "Physical.Activity.Level" "Stress.Level"           
##  [9] "BMI.Category"            "Heart.Rate"             
## [11] "Daily.Steps"             "Sleep.Disorder"         
## [13] "Systolic"                "Diastolic"

What this tells us: We turned the single “Blood Pressure” text into two usable numbers. We also filled in missing sleep disorder labels so every row has a value. The data is now clean and ready.

5. Descriptive Statistics

Here we summarise the key numbers in the dataset — who are these 1,500 people?

Q1: What is the average age of participants?

cat("Mean Age:", round(mean(sleep_data$Age), 1), "years\n")

## Mean Age: 48.4 years

cat("Youngest:", min(sleep_data$Age), "  Oldest:", max(sleep_data$Age), "\n")

## Youngest: 18   Oldest: 80

What this tells us: The average person in this dataset is about 48 years old, ranging from 18 to 80 — so we have a good mix of young and older adults.

Q2: How many men and women are in the dataset?

table(sleep_data$Gender)

## 
## Female   Male 
##    776    724

round(prop.table(table(sleep_data$Gender)) * 100, 1)

## 
## Female   Male 
##   51.7   48.3

What this tells us: The split is almost equal — 48% male and 52% female — so our results won’t be skewed toward one gender.

Q3: How stressed are people on average?

quantile(sleep_data$Stress.Level)

##   0%  25%  50%  75% 100% 
##    1    4    6    8   10

cat("Average stress:", round(mean(sleep_data$Stress.Level), 1), "out of 10\n")

## Average stress: 6 out of 10

What this tells us: The average stress level is around 6 out of 10. Half the people score 6 or higher — so stress is quite common in this group.

Q4: How common are sleep disorders?

disorder_table <- table(sleep_data$Sleep.Disorder)
disorder_table

## 
##              Insomnia            Narcolepsy                  None 
##                   171                    87                   961 
## Restless Leg Syndrome           Sleep Apnea 
##                   103                   178

round(prop.table(disorder_table) * 100, 1)

## 
##              Insomnia            Narcolepsy                  None 
##                  11.4                   5.8                  64.1 
## Restless Leg Syndrome           Sleep Apnea 
##                   6.9                  11.9

What this tells us: 64% of people have no sleep disorder. Among those who do, Sleep Apnea (12%) and Insomnia (11%) are the most common. Restless Leg Syndrome and Narcolepsy are less frequent.

Q5: Which jobs have the most physically active people?

activity_by_job <- sort(tapply(sleep_data$Physical.Activity.Level,
                                sleep_data$Occupation, mean), decreasing = TRUE)
round(activity_by_job, 1)

##                 Chef                Nurse          Salesperson 
##                100.7                 92.2                 82.2 
##              Student              Teacher Sales Representative 
##                 77.4                 75.4                 71.4 
##    Software Engineer               Artist            Scientist 
##                 60.4                 54.7                 50.1 
##             Engineer              Manager           Accountant 
##                 49.0                 43.4                 38.5 
##               Doctor               Writer               Lawyer 
##                 36.6                 31.2                 24.9

What this tells us: Chefs are the most physically active, followed by Nurses and Salespersons. Jobs like Scientists and Lawyers tend to be more sedentary.

6. Data Extraction and Filtering

Here we pull out specific groups of people to look at them more closely.

Q6: How many people are “healthy sleepers” — sleep over 7 hrs AND low stress?

healthy <- subset(sleep_data, Sleep.Duration > 7 & Stress.Level < 5)
cat("Healthy sleepers:", nrow(healthy), "out of", nrow(sleep_data), "\n")

## Healthy sleepers: 375 out of 1500

head(healthy[, c("Gender", "Age", "Occupation", "Sleep.Duration", "Stress.Level")])

##    Gender Age        Occupation Sleep.Duration Stress.Level
## 5  Female  18       Salesperson            8.8            4
## 14   Male  78 Software Engineer            8.2            4
## 15 Female  23         Scientist            7.8            3
## 26   Male  77           Manager            9.5            4
## 29   Male  72         Scientist            8.0            2
## 30 Female  19           Student            7.4            4

What this tells us: Only a subset of participants tick both boxes — sleeping well AND having low stress. This group represents the low-risk category.

Q7: Among overweight people, how many also have high stress?

high_risk <- subset(sleep_data, BMI.Category == "Overweight" & Stress.Level > 7)
cat("Overweight + High Stress:", nrow(high_risk), "people\n")

## Overweight + High Stress: 131 people

What this tells us: 131 people in the dataset are both overweight and highly stressed — a group at elevated risk for sleep problems and health complications.

Q8: What sleep disorders do women have?

female_disorders <- sleep_data[sleep_data$Gender == "Female" &
                                sleep_data$Sleep.Disorder != "No Disorder", ]
cat("Women with a sleep disorder:", nrow(female_disorders), "\n")

## Women with a sleep disorder: 776

table(female_disorders$Sleep.Disorder)

## 
##              Insomnia            Narcolepsy                  None 
##                    74                    42                   507 
## Restless Leg Syndrome           Sleep Apnea 
##                    49                   104

What this tells us: A notable number of women in this dataset have diagnosed sleep disorders, spread across all four disorder types.

7. Visualization

Charts are the best way to see patterns. Each chart below answers one clear question.

Chart 1: How is sleep duration spread across all participants?

hist(sleep_data$Sleep.Duration,
     main   = "How Many Hours Do People Sleep?",
     xlab   = "Sleep Duration (hours)",
     col    = "lightsteelblue",
     border = "white",
     breaks = 20)
abline(v = mean(sleep_data$Sleep.Duration), col = "red", lwd = 2, lty = 2)
legend("topright", legend = paste("Average:", round(mean(sleep_data$Sleep.Duration),1), "hrs"),
       col = "red", lty = 2, lwd = 2)

What this tells us: Most people sleep between 6 and 9 hours. The red dashed line shows the average at about 7.7 hours. The shape is roughly a bell curve — sleep duration is fairly normally spread across participants.

Chart 2: Does more stress mean less sleep?

ggplot(sleep_data, aes(x = Stress.Level, y = Sleep.Duration)) +
  geom_point(alpha = 0.3, colour = "steelblue") +
  geom_smooth(method = "lm", colour = "red", se = FALSE) +
  labs(title = "Stress Level vs Sleep Duration",
       subtitle = "Does stress reduce how much people sleep?",
       x = "Stress Level (1-10)", y = "Sleep Duration (hours)") +
  theme_minimal()

What this tells us: Yes — the red line goes downward, meaning higher stress is clearly linked to fewer hours of sleep. This is one of the strongest patterns in the entire dataset.

Chart 3: How does sleep quality differ by BMI group?

ggplot(sleep_data, aes(x = BMI.Category, y = Quality.of.Sleep, fill = BMI.Category)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16) +
  labs(title = "Sleep Quality by BMI Category",
       subtitle = "Does body weight affect how well people sleep?",
       x = "BMI Category", y = "Quality of Sleep (1-10)") +
  theme_minimal() +
  theme(legend.position = "none")

What this tells us: Each box shows the range of sleep quality scores for that BMI group. The middle line is the median (typical person). All groups are broadly similar here, but Obese and Overweight individuals show slightly more variation in their sleep quality.

Chart 4: Does having a sleep disorder relate to higher stress?

ggplot(sleep_data, aes(x = Sleep.Disorder, y = Stress.Level, fill = Sleep.Disorder)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16) +
  labs(title = "Stress Level by Sleep Disorder Type",
       subtitle = "Are stressed people more likely to have a sleep disorder?",
       x = "Sleep Disorder", y = "Stress Level (1-10)") +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 20, hjust = 1))

What this tells us: People with Insomnia and Sleep Apnea tend to have higher stress levels compared to those with no disorder. This suggests stress and sleep disorders often go together.

Chart 5: Does age affect blood pressure?

ggplot(sleep_data, aes(x = Age, y = Systolic)) +
  geom_point(alpha = 0.3, colour = "darkred") +
  geom_smooth(method = "lm", colour = "blue", se = FALSE) +
  labs(title = "Age vs Systolic Blood Pressure",
       subtitle = "Does blood pressure rise as people get older?",
       x = "Age (years)", y = "Systolic Blood Pressure (mmHg)") +
  theme_minimal()

What this tells us: The blue line goes upward — blood pressure tends to increase as people age. This is a well-known medical fact and our data confirms it clearly.

Chart 6: How many people have each type of sleep disorder?

ggplot(sleep_data, aes(x = Sleep.Disorder, fill = Sleep.Disorder)) +
  geom_bar() +
  labs(title = "How Common is Each Sleep Disorder?",
       x = "Sleep Disorder", y = "Number of People") +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 20, hjust = 1))

What this tells us: The vast majority of participants have no disorder. Among the disorders, Sleep Apnea and Insomnia are most common — together affecting roughly 1 in 4 participants.

Chart 7: Which occupations sleep the least?

avg_sleep_job <- aggregate(Sleep.Duration ~ Occupation, data = sleep_data, mean)
avg_sleep_job <- avg_sleep_job[order(avg_sleep_job$Sleep.Duration), ]

ggplot(avg_sleep_job, aes(x = reorder(Occupation, Sleep.Duration), y = Sleep.Duration)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Average Sleep Duration by Occupation",
       subtitle = "Which jobs are linked to less sleep?",
       x = "Occupation", y = "Average Sleep Duration (hours)") +
  theme_minimal()

What this tells us: Sleep duration varies across occupations. Jobs at the bottom of the chart are linked to shorter average sleep — useful for identifying which professional groups may need targeted wellness support.

8. Correlation Analysis

What is correlation? It measures how strongly two things are related.

A value near +1 means: when one goes up, the other goes up too (positive)
A value near –1 means: when one goes up, the other goes down (negative)
A value near 0 means: no clear connection (zero correlation)

Correlation Heatmap — All Variables at Once

numeric_vars <- sleep_data[, c("Age", "Sleep.Duration", "Quality.of.Sleep",
                                "Physical.Activity.Level", "Stress.Level",
                                "Heart.Rate", "Daily.Steps", "Systolic", "Diastolic")]

cor_matrix <- round(cor(numeric_vars), 2)

corrplot(cor_matrix,
         method      = "color",
         type        = "upper",
         addCoef.col = "black",
         tl.col      = "black",
         tl.srt      = 45,
         number.cex  = 0.75,
         col         = colorRampPalette(c("firebrick3", "white", "steelblue4"))(200),
         title       = "Correlation Heatmap – Sleep Health Variables",
         mar         = c(0, 0, 2, 0))

How to read this chart: Each square shows the correlation between two variables. Dark blue = strong positive link. Dark red = strong negative link. White = no clear link. The number inside each square is the exact correlation value.

Key findings from this heatmap:

Variables	Value	Plain English
Systolic ↔︎ Diastolic	+0.91	Both blood pressure readings rise and fall together — expected
Stress Level → Sleep Duration	–0.51	More stress = fewer hours of sleep — strongest lifestyle link
Daily Steps → Heart Rate	–0.42	More steps per day = lower resting heart rate
Age → Systolic BP	+0.35	Older people tend to have higher blood pressure
Stress Level → Quality of Sleep	–0.24	Higher stress = slightly lower sleep quality

Significance Test — Is the stress–sleep link real or random?

cor.test(sleep_data$Stress.Level, sleep_data$Sleep.Duration)

## 
##  Pearson's product-moment correlation
## 
## data:  sleep_data$Stress.Level and sleep_data$Sleep.Duration
## t = -22.7, df = 1498, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5426273 -0.4672593
## sample estimates:
##        cor 
## -0.5059082

What this tells us: The p-value is the probability that this result happened by chance. Here it is much less than 0.05, which means the link between stress and sleep duration is statistically real — not a coincidence. The cor value of about –0.51 confirms it is a meaningful negative relationship.

9. Regression Analysis

What is regression? It draws the “best fit line” through a scatter of points and lets us say: “For every 1 unit increase in X, Y changes by this much.” It also tells us how well X predicts Y using R² (R-squared) — the closer R² is to 1, the better the prediction.

Simple Regression 1: Does stress predict sleep duration?

model1 <- lm(Sleep.Duration ~ Stress.Level, data = sleep_data)
summary(model1)

## 
## Call:
## lm(formula = Sleep.Duration ~ Stress.Level, data = sleep_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.36151 -0.53350  0.02355  0.53102  2.52355 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.909228   0.054798   162.6   <2e-16 ***
## Stress.Level -0.192531   0.008482   -22.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7772 on 1498 degrees of freedom
## Multiple R-squared:  0.2559, Adjusted R-squared:  0.2554 
## F-statistic: 515.3 on 1 and 1498 DF,  p-value: < 2.2e-16

ggplot(sleep_data, aes(x = Stress.Level, y = Sleep.Duration)) +
  geom_point(alpha = 0.3, colour = "salmon") +
  geom_smooth(method = "lm", colour = "darkred", se = TRUE) +
  labs(title = "Regression: Stress Level → Sleep Duration",
       subtitle = "The line shows the predicted sleep duration for each stress level",
       x = "Stress Level (1-10)", y = "Sleep Duration (hours)") +
  theme_minimal()

What this tells us: The line slopes downward — as stress increases by 1 point, sleep duration drops by roughly 0.27 hours (about 16 minutes). The shaded area shows the uncertainty band. R² tells us how much of the variation in sleep duration is explained by stress alone.

Simple Regression 2: Does age predict blood pressure?

model2 <- lm(Systolic ~ Age, data = sleep_data)
summary(model2)

## 
## Call:
## lm(formula = Systolic ~ Age, data = sleep_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.3961  -7.2221  -0.8693   7.3325  23.5376 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 110.39386    0.73515   150.2   <2e-16 ***
## Age           0.20343    0.01422    14.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10 on 1498 degrees of freedom
## Multiple R-squared:  0.1202, Adjusted R-squared:  0.1196 
## F-statistic: 204.6 on 1 and 1498 DF,  p-value: < 2.2e-16

ggplot(sleep_data, aes(x = Age, y = Systolic)) +
  geom_point(alpha = 0.3, colour = "darkgreen") +
  geom_smooth(method = "lm", colour = "blue", se = TRUE) +
  labs(title = "Regression: Age → Systolic Blood Pressure",
       subtitle = "The line shows predicted blood pressure at each age",
       x = "Age (years)", y = "Systolic BP (mmHg)") +
  theme_minimal()

What this tells us: For every extra year of age, systolic blood pressure rises by about 0.2 mmHg. The upward slope confirms the age–blood pressure relationship. R² shows how much of the variation in blood pressure is explained purely by age.

Multiple Regression: What combination of factors best predicts sleep duration?

model3 <- lm(Sleep.Duration ~ Stress.Level + Physical.Activity.Level +
               Age + Heart.Rate + Daily.Steps,
             data = sleep_data)
summary(model3)

## 
## Call:
## lm(formula = Sleep.Duration ~ Stress.Level + Physical.Activity.Level + 
##     Age + Heart.Rate + Daily.Steps, data = sleep_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.39771 -0.53024  0.01216  0.51986  2.57172 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              8.835e+00  1.775e-01  49.768   <2e-16 ***
## Stress.Level            -1.927e-01  8.516e-03 -22.630   <2e-16 ***
## Physical.Activity.Level -1.026e-03  1.526e-03  -0.673    0.501    
## Age                     -5.690e-04  1.107e-03  -0.514    0.607    
## Heart.Rate               1.438e-03  1.841e-03   0.781    0.435    
## Daily.Steps              9.122e-06  2.042e-05   0.447    0.655    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7777 on 1494 degrees of freedom
## Multiple R-squared:  0.2571, Adjusted R-squared:  0.2547 
## F-statistic: 103.4 on 5 and 1494 DF,  p-value: < 2.2e-16

What this tells us: When we combine several predictors together:

Stress Level has the strongest negative effect — the biggest driver of shorter sleep
Physical Activity has a positive effect — more active people tend to sleep slightly longer
The Adjusted R² is higher than in the simple models, meaning combining these factors gives a better picture than any single variable alone
The overall model is statistically significant (F-test p < 0.001)

Model Diagnostics — Is our regression model reliable?

par(mfrow = c(2, 2))
plot(model3)

par(mfrow = c(1, 1))

What these 4 plots check (in simple terms):

Top-left (Residuals vs Fitted): Points should scatter randomly — no curved pattern. If it looks random, the model fits well.
Top-right (Normal Q-Q): Points should follow the diagonal line. This checks that the errors are “normally distributed.”
Bottom-left (Scale-Location): The red line should be roughly flat — checks that errors are consistent across all predictions.
Bottom-right (Residuals vs Leverage): Spots any single data points that are pulling the regression line away from where it should be.

Model Comparison — Simple vs Multiple Regression

comparison <- data.frame(
  Model = c("Simple: Stress → Sleep Duration",
            "Simple: Age → Systolic BP",
            "Multiple: Combined → Sleep Duration"),
  R_Squared = c(round(summary(model1)$r.squared, 3),
                round(summary(model2)$r.squared, 3),
                round(summary(model3)$r.squared, 3)),
  Adj_R_Squared = c(round(summary(model1)$adj.r.squared, 3),
                    round(summary(model2)$adj.r.squared, 3),
                    round(summary(model3)$adj.r.squared, 3))
)
print(comparison)

##                                 Model R_Squared Adj_R_Squared
## 1     Simple: Stress → Sleep Duration     0.256         0.255
## 2           Simple: Age → Systolic BP     0.120         0.120
## 3 Multiple: Combined → Sleep Duration     0.257         0.255

What this tells us: R² is the “score” of each model — how well it predicts the outcome. The multiple regression model scores higher than any single predictor, showing that sleep is affected by several things at once, not just one factor.

10. Conclusion

After analysing 1,500 participants, here are the key takeaways in plain English:

Stress reduces sleep the most. Out of everything we measured, stress level had the strongest link to how long people sleep (r = –0.51). The more stressed someone is, the fewer hours they sleep.
Active people have healthier hearts. People who walk more steps per day tend to have a noticeably lower resting heart rate (r = –0.42) — a sign of better cardiovascular fitness.
Blood pressure rises with age. This confirms a well-known medical fact: as we get older, our blood pressure tends to go up — and our data shows this clearly.
Sleep disorders are common. Over 1 in 3 participants has a diagnosed sleep condition. Insomnia and Sleep Apnea are linked to higher stress levels.
No single factor tells the whole story. Sleep is complex — multiple regression showed that combining stress, activity, age, and heart rate together gives a better prediction than any one factor alone.

Bottom line: The most impactful way to improve sleep health in this population is to manage stress and stay physically active.

Sleep Health and Lifestyle Analysis

Nazeel Ahamad and Athul Shaji

2026-05-05