Student: Katja Volk Å tefiÄ
Research question: Is there a difference in sleep duration between males and females?
mydata <- read.table("./Sleep_health_and_lifestyle_dataset.csv", header=TRUE, sep = ",", dec = ".")
head(mydata)
## Person.ID Gender Age Occupation Sleep.Duration
## 1 1 Male 27 Software Engineer 6.1
## 2 2 Male 28 Doctor 6.2
## 3 3 Male 28 Doctor 6.2
## 4 4 Male 28 Sales Representative 5.9
## 5 5 Male 28 Sales Representative 5.9
## 6 6 Male 28 Software Engineer 5.9
## Quality.of.Sleep Physical.Activity.Level Stress.Level BMI.Category
## 1 6 42 6 Overweight
## 2 6 60 8 Normal
## 3 6 60 8 Normal
## 4 4 30 8 Obese
## 5 4 30 8 Obese
## 6 4 30 8 Obese
## Blood.Pressure Heart.Rate Daily.Steps Sleep.Disorder
## 1 126/83 77 4200 None
## 2 125/80 75 10000 None
## 3 125/80 75 10000 None
## 4 140/90 85 3000 Sleep Apnea
## 5 140/90 85 3000 Sleep Apnea
## 6 140/90 85 3000 Insomnia
Unit of observation is one participant of this study.
Sample size is 374.
Explanation of data:
Person ID: An identifier for each individual.
Gender: The gender of the person (Male/Female).
Age: The age of the person in years.
Occupation: The occupation or profession of the person.
Sleep Duration: The number of hours the person sleeps per day.
Quality of Sleep: A subjective rating of the quality of sleep, ranging from 1 to 10.
Physical Activity Level: The number of minutes the person engages in physical activity daily.
Stress Level: A subjective rating of the stress level experienced by the person, ranging from 1 to 10.
BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).
Blood Pressure (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.
Heart Rate: The resting heart rate of the person in beats per minute.
Daily Steps: The number of steps the person takes per day.
Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).
The source of the data is Kaggle.
mydata$GenderF <- factor(mydata$Gender,
levels = c("Male", "Female"),
labels = c("Male", "Female"))
mydata$OccupationF <- factor(mydata$Occupation,
levels = c("Accountant","Doctor", "Engineer", "Lawyer", "Manager", "Nurse", "Sales Representative", "Salesperson", "Scientist", "Software Engineer", "Teacher"),
labels = c("Accountant","Doctor", "Engineer", "Lawyer", "Manager", "Nurse", "Sales Representative", "Salesperson", "Scientist", "Software Engineer", "Teacher"))
mydata$BMI.CategoryF <- factor(mydata$BMI.Category,
levels = c("Normal", "Normal Weight", "Obese", "Overweight"),
labels = c("Normal", "Normal Weight", "Obese", "Overweight"))
mydata$Sleep.DisorderF <- factor(mydata$Sleep.Disorder,
levels = c("Insomnia", "None", "Sleep Apnea"),
labels = c("Insomnia", "None", "Sleep Apnea"))
#Changing variables into factors
summary(mydata[ , c(-1, -2, -4, -9, -10, -13)])
## Age Sleep.Duration Quality.of.Sleep
## Min. :27.00 Min. :5.800 Min. :4.000
## 1st Qu.:35.25 1st Qu.:6.400 1st Qu.:6.000
## Median :43.00 Median :7.200 Median :7.000
## Mean :42.18 Mean :7.132 Mean :7.313
## 3rd Qu.:50.00 3rd Qu.:7.800 3rd Qu.:8.000
## Max. :59.00 Max. :8.500 Max. :9.000
##
## Physical.Activity.Level Stress.Level Heart.Rate
## Min. :30.00 Min. :3.000 Min. :65.00
## 1st Qu.:45.00 1st Qu.:4.000 1st Qu.:68.00
## Median :60.00 Median :5.000 Median :70.00
## Mean :59.17 Mean :5.385 Mean :70.17
## 3rd Qu.:75.00 3rd Qu.:7.000 3rd Qu.:72.00
## Max. :90.00 Max. :8.000 Max. :86.00
##
## Daily.Steps GenderF OccupationF BMI.CategoryF
## Min. : 3000 Male :189 Nurse :73 Normal :195
## 1st Qu.: 5600 Female:185 Doctor :71 Normal Weight: 21
## Median : 7000 Engineer :63 Obese : 10
## Mean : 6817 Lawyer :47 Overweight :148
## 3rd Qu.: 8000 Teacher :40
## Max. :10000 Accountant:37
## (Other) :43
## Sleep.DisorderF
## Insomnia : 77
## None :219
## Sleep Apnea: 78
##
##
##
##
#Descriptive statistics
Explanation of estimates of parameters:
Age: Mean:42.18
The average age of participants in this study was 42.18 years.
Sleep duration: Min:5.800
The minimal sleep time was 5.800 hours per day.
Daily steps: 1st Qu.:5600
25% of participants did 5600 steps per day or less and 75% of
participants did more than 5600 steps per day.
As my research question is if there is a difference in sleep duration between males sand females, I only need variables Gender and Sleep.duration.
If we want to perform a parametric test- Independent Sample t-test these assumptions have to be met:
Main variable: Sleep duration, Factorial variable: gender
library(psych)
describeBy(mydata$Sleep.Duration, mydata$GenderF)
##
## Descriptive statistics by group
## group: Male
## vars n mean sd median trimmed mad min max range skew
## X1 1 189 7.04 0.69 7.2 7.06 0.89 5.9 8.1 2.2 -0.27
## kurtosis se
## X1 -1.53 0.05
## ----------------------------------------------------
## group: Female
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 185 7.23 0.88 7.2 7.23 1.33 5.8 8.5 2.7 0.07 -1.48
## se
## X1 0.06
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
Sleep.Duration_male <- ggplot(mydata[mydata$GenderF == "Male", ], aes(x = Sleep.Duration)) +
theme_linedraw() +
geom_histogram(binwidth = 0.2, colour= "lightskyblue3", fill="lightskyblue1") +
ggtitle("Sleep_duration_male")
Sleep.Duration_female <- ggplot(mydata[mydata$GenderF == "Female", ], aes(x = Sleep.Duration)) +
theme_linedraw() +
geom_histogram(binwidth = 0.2, colour= "plum3", fill="plum2") +
ggtitle("Sleep_duration_female")
library(ggpubr)
ggarrange(Sleep.Duration_male, Sleep.Duration_female,
ncol = 2, nrow = 1)
Distribution of sleep duration doesnât look normal in population of males nor in the population of females. We performed Shapiro-Wilk test just to make sure that this is true.
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
mydata %>%
group_by(GenderF) %>%
shapiro_test(Sleep.Duration)
## # A tibble: 2 Ã 4
## GenderF variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 Male Sleep.Duration 0.872 1.51e-11
## 2 Female Sleep.Duration 0.899 6.36e-10
H0: Sleep duration is normally distributed.
H1: Sleep duration is not normally distributed.
We reject H0 for males at p < 0.001. We reject H0 for females at p < 0.001.
Because normality is not met, we have to perform a non-numeric test-Wilcoxon Rank Sum Test.
library(ggplot2)
ggplot(mydata, aes(x= Sleep.Duration, fill= GenderF)) +
geom_histogram(position=position_dodge(width=2.5), binwidth = 0.2, colour= "black") +
scale_x_continuous(breaks = seq(0, 75, 5)) +
ylab("Frequency") +
labs(fill="Gender") +
scale_fill_manual(values = c("Male" = "cadetblue1", "Female" = "orange"))
wilcox.test(mydata$Sleep.Duration ~ mydata$GenderF,
paired = FALSE,
correct = FALSE,
exact= FALSE,
alternative = "two.sided")
##
## Wilcoxon rank sum test
##
## data: mydata$Sleep.Duration by mydata$GenderF
## W = 14930, p-value = 0.01442
## alternative hypothesis: true location shift is not equal to 0
H0:The locations of distributions of the sleep duration are the same for males and females.
H1:The locations of distributions of the sleep duration are different for males and females.
Based on the sample data, we reject H0 at p=0.015 and conclude that there are differences in sleep duration of males and females.
library(effectsize)
##
## Attaching package: 'effectsize'
## The following objects are masked from 'package:rstatix':
##
## cohens_d, eta_squared
## The following object is masked from 'package:psych':
##
## phi
effectsize(wilcox.test(mydata$Sleep.Duration ~ mydata$GenderF,
paired = FALSE,
correct = FALSE,
exact= FALSE,
alternative = "two.sided"))
## r (rank biserial) | 95% CI
## ----------------------------------
## -0.15 | [-0.26, -0.03]
interpret_rank_biserial(0.15)
## [1] "small"
## (Rules: funder2019)
Conclusion: Based on the data, we find out that men and women differ in the sleep duration (p=0.015).Women have a longer sleep duration, the difference in distribution is small (r=0.15).
If the variable sleep duration would be normally distributed, we would perform a numeric test called Independent samples t-test.
t.test(mydata$Sleep.Duration ~ mydata$GenderF,
paired = FALSE,
var.equal = FALSE,
alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: mydata$Sleep.Duration by mydata$GenderF
## t = -2.3565, df = 349.38, p-value = 0.019
## alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
## 95 percent confidence interval:
## -0.35448564 -0.03195795
## sample estimates:
## mean in group Male mean in group Female
## 7.036508 7.229730
H0: The difference in means between males and females is equal to zero.
H1: The difference in means between males and females is not equal to zero.
We reject H0 at p=0.020.
library(effectsize)
effectsize::cohens_d(mydata$Sleep.Duration ~ mydata$GenderF,
pooled_sd = FALSE)
## Cohen's d | 95% CI
## --------------------------
## -0.24 | [-0.45, -0.04]
##
## - Estimated using un-pooled SD.
interpret_cohens_d(0.24, rules = "sawilowsky2009")
## [1] "small"
## (Rules: sawilowsky2009)