Student: Katja Volk Štefić
mydata <- read.table("./Sleep_health_and_lifestyle_dataset.csv", header=TRUE, sep = ",", dec = ".")
head(mydata)
## Person.ID Gender Age Occupation Sleep.Duration
## 1 1 Male 27 Software Engineer 6.1
## 2 2 Male 28 Doctor 6.2
## 3 3 Male 28 Doctor 6.2
## 4 4 Male 28 Sales Representative 5.9
## 5 5 Male 28 Sales Representative 5.9
## 6 6 Male 28 Software Engineer 5.9
## Quality.of.Sleep Physical.Activity.Level Stress.Level BMI.Category
## 1 6 42 6 Overweight
## 2 6 60 8 Normal
## 3 6 60 8 Normal
## 4 4 30 8 Obese
## 5 4 30 8 Obese
## 6 4 30 8 Obese
## Blood.Pressure Heart.Rate Daily.Steps Sleep.Disorder
## 1 126/83 77 4200 None
## 2 125/80 75 10000 None
## 3 125/80 75 10000 None
## 4 140/90 85 3000 Sleep Apnea
## 5 140/90 85 3000 Sleep Apnea
## 6 140/90 85 3000 Insomnia
Unit of observation is one participant of this study.
Sample size is 374.
Explanation of data:
Person ID: An identifier for each individual.
Gender: The gender of the person (Male/Female).
Age: The age of the person in years.
Occupation: The occupation or profession of the person.
Sleep Duration: The number of hours the person sleeps per day.
Quality of Sleep: A subjective rating of the quality of sleep, ranging from 1 to 10.
Physical Activity Level: The number of minutes the person engages in physical activity daily.
Stress Level: A subjective rating of the stress level experienced by the person, ranging from 1 to 10.
BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).
Blood Pressure (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.
Heart Rate: The resting heart rate of the person in beats per minute.
Daily Steps: The number of steps the person takes per day.
Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).
The source of the data is Kaggle.
mydata$GenderF <- factor(mydata$Gender,
levels = c("Male", "Female"),
labels = c("Male", "Female"))
mydata$OccupationF <- factor(mydata$Occupation,
levels = c("Accountant","Doctor", "Engineer", "Lawyer", "Manager", "Nurse", "Sales Representative", "Salesperson", "Scientist", "Software Engineer", "Teacher"),
labels = c("Accountant","Doctor", "Engineer", "Lawyer", "Manager", "Nurse", "Sales Representative", "Salesperson", "Scientist", "Software Engineer", "Teacher"))
mydata$BMI.CategoryF <- factor(mydata$BMI.Category,
levels = c("Normal", "Normal Weight", "Obese", "Overweight"),
labels = c("Normal", "Normal Weight", "Obese", "Overweight"))
mydata$Sleep.DisorderF <- factor(mydata$Sleep.Disorder,
levels = c("None", "Insomnia", "Sleep Apnea"),
labels = c("None", "Insomnia", "Sleep Apnea"))
#Changing variables into factors
summary(mydata[ , c(-1, -2, -4, -9, -10, -13)])
## Age Sleep.Duration Quality.of.Sleep
## Min. :27.00 Min. :5.800 Min. :4.000
## 1st Qu.:35.25 1st Qu.:6.400 1st Qu.:6.000
## Median :43.00 Median :7.200 Median :7.000
## Mean :42.18 Mean :7.132 Mean :7.313
## 3rd Qu.:50.00 3rd Qu.:7.800 3rd Qu.:8.000
## Max. :59.00 Max. :8.500 Max. :9.000
##
## Physical.Activity.Level Stress.Level Heart.Rate
## Min. :30.00 Min. :3.000 Min. :65.00
## 1st Qu.:45.00 1st Qu.:4.000 1st Qu.:68.00
## Median :60.00 Median :5.000 Median :70.00
## Mean :59.17 Mean :5.385 Mean :70.17
## 3rd Qu.:75.00 3rd Qu.:7.000 3rd Qu.:72.00
## Max. :90.00 Max. :8.000 Max. :86.00
##
## Daily.Steps GenderF OccupationF BMI.CategoryF
## Min. : 3000 Male :189 Nurse :73 Normal :195
## 1st Qu.: 5600 Female:185 Doctor :71 Normal Weight: 21
## Median : 7000 Engineer :63 Obese : 10
## Mean : 6817 Lawyer :47 Overweight :148
## 3rd Qu.: 8000 Teacher :40
## Max. :10000 Accountant:37
## (Other) :43
## Sleep.DisorderF
## None :219
## Insomnia : 77
## Sleep Apnea: 78
##
##
##
##
#Descriptive statistics
Daily steps - Median:7000
Half of the participants do up to 7000 steps per day, the other half do more than 7000 steps per day.
Age - 3rd Qu.:50.00
75% of participants are 50 years old and less, 25% of participants are more than 50 years old.
Physical activity level - Max.:90.00
Maximum physical activity level (number of minutes the person engages in physical activity daily) is 90.
library(ggplot2)
ggplot(data = mydata, aes(x = Stress.Level, y = Sleep.Duration)) +
geom_bar(stat = "summary", fun = "mean", na.rm = TRUE) +
ylab("Sleep Duration") +
xlab("Stress Level")
library(car)
## Loading required package: carData
scatterplot(y= mydata$Sleep.Duration,
x= mydata$Stress.Level,
ylab= "Sleep duration in hours",
xlab= "Stress level",
smooth = FALSE,
boxplot= FALSE)
If we want to check if there is correlation between stress level (measured on a Likert scale) and sleep duration in hours (2 numerical variables) we have to use Pearson’s correlation coefficient. We have also checked if the relationship is linear with a scatterplot and concluded that it is.
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
rcorr(as.matrix(mydata[, c(5,8)]),
type = "pearson")
## Sleep.Duration Stress.Level
## Sleep.Duration 1.00 -0.81
## Stress.Level -0.81 1.00
##
## n= 374
##
##
## P
## Sleep.Duration Stress.Level
## Sleep.Duration 0
## Stress.Level 0
HO: Population correlation coefficient of stress level and sleep duration is equal to zero.
H1: Population correlation coefficient of stress level and sleep duration is NOT equal to zero.
Based on the sample data we can reject H0 at p < 0.001. There is a statistically significant relationship between stress level and sleep duration. Linear relationship between stress level and sleep duration is negative and strong (r=0.81).
head(mydata[colnames(mydata) %in% c("GenderF", "Sleep.Disorder")])
## Sleep.Disorder GenderF
## 1 None Male
## 2 None Male
## 3 None Male
## 4 Sleep Apnea Male
## 5 Sleep Apnea Male
## 6 Insomnia Male
results <- chisq.test(mydata$GenderF, mydata$Sleep.DisorderF,
correct = FALSE)
results
##
## Pearson's Chi-squared test
##
## data: mydata$GenderF and mydata$Sleep.DisorderF
## X-squared = 54.306, df = 2, p-value = 1.613e-12
HO: There is no association between GenderF and Sleep.DisorderF.
H1: There is association between GenderF and Sleep.DisorderF.
Based on the sample data we can reject H0 (p< 0.001). We can claim that there is association between gender and sleep disorder.
addmargins(results$observed) #Empirical frequencies
## mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea Sum
## Male 137 41 11 189
## Female 82 36 67 185
## Sum 219 77 78 374
round(results$expected, 2) #Expected frequencies
## mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea
## Male 110.67 38.91 39.42
## Female 108.33 38.09 38.58
round(results$res, 2) #Standardized residuals
## mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea
## Male 2.50 0.33 -4.53
## Female -2.53 -0.34 4.57
2.50 There is more than expected number of people in category male and none (α=0.05).
0.33 and -0.34 We cannot say that gender has any effect on insomnia.
-4.53 There is less than expected number of people in category male and apnea (α=0.001).
-2.53 There is less than expected number of people in category female and none (α=0.05).
4.57 There is more than expected number of people in category female and apnea (α=0.001).
addmargins(round(prop.table(results$observed), 4))
## mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea Sum
## Male 0.3663 0.1096 0.0294 0.5053
## Female 0.2193 0.0963 0.1791 0.4947
## Sum 0.5856 0.2059 0.2085 1.0000
Out of 374 people, there is 10.96% males who have insomnia.
addmargins(round(prop.table(results$observed,1), 3), 2)
## mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea Sum
## Male 0.725 0.217 0.058 1.000
## Female 0.443 0.195 0.362 1.000
Out of all females, 44.3% do not have any sleep disorder.
addmargins(round(prop.table(results$observed, 2), 3), 1)
## mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea
## Male 0.626 0.532 0.141
## Female 0.374 0.468 0.859
## Sum 1.000 1.000 1.000
Out of all people who have sleep apnea, 85.9% are females.
library(effectsize)
effectsize::cramers_v(mydata$GenderF, mydata$Sleep.DisorderF)
## Cramer's V (adj.) | 95% CI
## --------------------------------
## 0.37 | [0.28, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
#Calculating effect size
interpret_cramers_v(0.37)
## [1] "large"
## (Rules: funder2019)
Conclusion: There is an association between gender and sleep disorder. Effect size is large (r=0.37). Males are more likely to not have a sleep disorder.