Student: Katja Volk Štefić

mydata <- read.table("./Sleep_health_and_lifestyle_dataset.csv", header=TRUE, sep = ",", dec = ".")

head(mydata)
##   Person.ID Gender Age           Occupation Sleep.Duration
## 1         1   Male  27    Software Engineer            6.1
## 2         2   Male  28               Doctor            6.2
## 3         3   Male  28               Doctor            6.2
## 4         4   Male  28 Sales Representative            5.9
## 5         5   Male  28 Sales Representative            5.9
## 6         6   Male  28    Software Engineer            5.9
##   Quality.of.Sleep Physical.Activity.Level Stress.Level BMI.Category
## 1                6                      42            6   Overweight
## 2                6                      60            8       Normal
## 3                6                      60            8       Normal
## 4                4                      30            8        Obese
## 5                4                      30            8        Obese
## 6                4                      30            8        Obese
##   Blood.Pressure Heart.Rate Daily.Steps Sleep.Disorder
## 1         126/83         77        4200           None
## 2         125/80         75       10000           None
## 3         125/80         75       10000           None
## 4         140/90         85        3000    Sleep Apnea
## 5         140/90         85        3000    Sleep Apnea
## 6         140/90         85        3000       Insomnia

Unit of observation is one participant of this study.

Sample size is 374.

Explanation of data:

The source of the data is Kaggle.

mydata$GenderF <- factor(mydata$Gender, 
                         levels = c("Male", "Female"), 
                         labels = c("Male", "Female"))
mydata$OccupationF <- factor(mydata$Occupation, 
                             levels = c("Accountant","Doctor", "Engineer", "Lawyer", "Manager", "Nurse", "Sales Representative", "Salesperson", "Scientist", "Software Engineer", "Teacher"),
                             labels = c("Accountant","Doctor", "Engineer", "Lawyer", "Manager", "Nurse", "Sales Representative", "Salesperson", "Scientist", "Software Engineer", "Teacher"))
mydata$BMI.CategoryF <- factor(mydata$BMI.Category,
                              levels = c("Normal", "Normal Weight", "Obese", "Overweight"),
                              labels = c("Normal", "Normal Weight", "Obese", "Overweight"))
mydata$Sleep.DisorderF <- factor(mydata$Sleep.Disorder,
                                 levels = c("None", "Insomnia", "Sleep Apnea"),
                                 labels = c("None", "Insomnia", "Sleep Apnea"))
#Changing variables into factors
summary(mydata[ , c(-1, -2, -4, -9, -10, -13)]) 
##       Age        Sleep.Duration  Quality.of.Sleep
##  Min.   :27.00   Min.   :5.800   Min.   :4.000   
##  1st Qu.:35.25   1st Qu.:6.400   1st Qu.:6.000   
##  Median :43.00   Median :7.200   Median :7.000   
##  Mean   :42.18   Mean   :7.132   Mean   :7.313   
##  3rd Qu.:50.00   3rd Qu.:7.800   3rd Qu.:8.000   
##  Max.   :59.00   Max.   :8.500   Max.   :9.000   
##                                                  
##  Physical.Activity.Level  Stress.Level     Heart.Rate   
##  Min.   :30.00           Min.   :3.000   Min.   :65.00  
##  1st Qu.:45.00           1st Qu.:4.000   1st Qu.:68.00  
##  Median :60.00           Median :5.000   Median :70.00  
##  Mean   :59.17           Mean   :5.385   Mean   :70.17  
##  3rd Qu.:75.00           3rd Qu.:7.000   3rd Qu.:72.00  
##  Max.   :90.00           Max.   :8.000   Max.   :86.00  
##                                                         
##   Daily.Steps      GenderF        OccupationF       BMI.CategoryF
##  Min.   : 3000   Male  :189   Nurse     :73   Normal       :195  
##  1st Qu.: 5600   Female:185   Doctor    :71   Normal Weight: 21  
##  Median : 7000                Engineer  :63   Obese        : 10  
##  Mean   : 6817                Lawyer    :47   Overweight   :148  
##  3rd Qu.: 8000                Teacher   :40                      
##  Max.   :10000                Accountant:37                      
##                               (Other)   :43                      
##     Sleep.DisorderF
##  None       :219   
##  Insomnia   : 77   
##  Sleep Apnea: 78   
##                    
##                    
##                    
## 
#Descriptive statistics

Daily steps - Median:7000

Half of the participants do up to 7000 steps per day, the other half do more than 7000 steps per day.

Age - 3rd Qu.:50.00

75% of participants are 50 years old and less, 25% of participants are more than 50 years old.

Physical activity level - Max.:90.00

Maximum physical activity level (number of minutes the person engages in physical activity daily) is 90.

Research question 1: Is there a correlation between stress level and sleep duration?

library(ggplot2)

ggplot(data = mydata, aes(x = Stress.Level, y = Sleep.Duration)) +
  geom_bar(stat = "summary", fun = "mean", na.rm = TRUE) +
  ylab("Sleep Duration") +
  xlab("Stress Level")

library(car)
## Loading required package: carData
scatterplot(y= mydata$Sleep.Duration,
            x= mydata$Stress.Level,
            ylab= "Sleep duration in hours",
            xlab= "Stress level",
            smooth = FALSE,
            boxplot= FALSE)

If we want to check if there is correlation between stress level (measured on a Likert scale) and sleep duration in hours (2 numerical variables) we have to use Pearson’s correlation coefficient. We have also checked if the relationship is linear with a scatterplot and concluded that it is.

library(Hmisc)
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
rcorr(as.matrix(mydata[, c(5,8)]),
      type = "pearson")
##                Sleep.Duration Stress.Level
## Sleep.Duration           1.00        -0.81
## Stress.Level            -0.81         1.00
## 
## n= 374 
## 
## 
## P
##                Sleep.Duration Stress.Level
## Sleep.Duration                 0          
## Stress.Level    0

HO: Population correlation coefficient of stress level and sleep duration is equal to zero.

H1: Population correlation coefficient of stress level and sleep duration is NOT equal to zero.

Based on the sample data we can reject H0 at p < 0.001. There is a statistically significant relationship between stress level and sleep duration. Linear relationship between stress level and sleep duration is negative and strong (r=0.81).

Research question 2: Is there a relationship between gender and sleep disorder?

head(mydata[colnames(mydata) %in% c("GenderF", "Sleep.Disorder")])
##   Sleep.Disorder GenderF
## 1           None    Male
## 2           None    Male
## 3           None    Male
## 4    Sleep Apnea    Male
## 5    Sleep Apnea    Male
## 6       Insomnia    Male
results <- chisq.test(mydata$GenderF, mydata$Sleep.DisorderF,
                      correct = FALSE)
results
## 
##  Pearson's Chi-squared test
## 
## data:  mydata$GenderF and mydata$Sleep.DisorderF
## X-squared = 54.306, df = 2, p-value = 1.613e-12

HO: There is no association between GenderF and Sleep.DisorderF.

H1: There is association between GenderF and Sleep.DisorderF.

Based on the sample data we can reject H0 (p< 0.001). We can claim that there is association between gender and sleep disorder.

addmargins(results$observed) #Empirical frequencies
##               mydata$Sleep.DisorderF
## mydata$GenderF None Insomnia Sleep Apnea Sum
##         Male    137       41          11 189
##         Female   82       36          67 185
##         Sum     219       77          78 374
round(results$expected, 2) #Expected frequencies
##               mydata$Sleep.DisorderF
## mydata$GenderF   None Insomnia Sleep Apnea
##         Male   110.67    38.91       39.42
##         Female 108.33    38.09       38.58
round(results$res, 2) #Standardized residuals
##               mydata$Sleep.DisorderF
## mydata$GenderF  None Insomnia Sleep Apnea
##         Male    2.50     0.33       -4.53
##         Female -2.53    -0.34        4.57

2.50 There is more than expected number of people in category male and none (α=0.05).

0.33 and -0.34 We cannot say that gender has any effect on insomnia.

-4.53 There is less than expected number of people in category male and apnea (α=0.001).

-2.53 There is less than expected number of people in category female and none (α=0.05).

4.57 There is more than expected number of people in category female and apnea (α=0.001).

addmargins(round(prop.table(results$observed), 4))
##               mydata$Sleep.DisorderF
## mydata$GenderF   None Insomnia Sleep Apnea    Sum
##         Male   0.3663   0.1096      0.0294 0.5053
##         Female 0.2193   0.0963      0.1791 0.4947
##         Sum    0.5856   0.2059      0.2085 1.0000

Out of 374 people, there is 10.96% males who have insomnia.

addmargins(round(prop.table(results$observed,1), 3), 2)
##               mydata$Sleep.DisorderF
## mydata$GenderF  None Insomnia Sleep Apnea   Sum
##         Male   0.725    0.217       0.058 1.000
##         Female 0.443    0.195       0.362 1.000

Out of all females, 44.3% do not have any sleep disorder.

addmargins(round(prop.table(results$observed, 2), 3), 1)
##               mydata$Sleep.DisorderF
## mydata$GenderF  None Insomnia Sleep Apnea
##         Male   0.626    0.532       0.141
##         Female 0.374    0.468       0.859
##         Sum    1.000    1.000       1.000

Out of all people who have sleep apnea, 85.9% are females.

library(effectsize)
effectsize::cramers_v(mydata$GenderF, mydata$Sleep.DisorderF)
## Cramer's V (adj.) |       95% CI
## --------------------------------
## 0.37              | [0.28, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].
#Calculating effect size
interpret_cramers_v(0.37)
## [1] "large"
## (Rules: funder2019)

Conclusion: There is an association between gender and sleep disorder. Effect size is large (r=0.37). Males are more likely to not have a sleep disorder.