Data in the Dark: An Analysis on Sleep Study Data

Introduction

This project uses data from the “SleepStudy” dataset that was provided. Using this dataset, the provided research questions will be analyzed using concepts learned in class.

The 10 questions to be analyzed are as follows: 1) Is there a significant difference in the average GPA between male and female college students? 2) Is there a significant difference in the average number of early classes between the first two class years and other class years? 3) Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”? 4) Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)? 5) Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status? 6) Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)? 7) Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use? 8) Is there a significant difference in the average number of drinks per week between students of different genders? 9) Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)? 10) Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Analysis

Here, we will explore 10 of the above questions in detail.

##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1 Is there a significant difference in the average GPA between male and female college students?

male = c(SleepStudyData$GPA[SleepStudyData$Gender == 1])
female = c(SleepStudyData$GPA[SleepStudyData$Gender == 0])
t.test(male, female, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  male and female
## t = -3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.30252780 -0.09982254
## sample estimates:
## mean of x mean of y 
##  3.123725  3.324901

result = t.test(male, female, var.equal = FALSE)
result$p.value

## [1] 0.000124298

Welch’s Two Sample t-test shows men had a mean GPA of 3.12 and females had a mean GPA of 3.32. The P value is < 0.05, so the difference is considered significant.

Q2 Is there a significant difference in the average number of early classes between the first two class years and other class years?

Year12 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(1,2)])
Year34 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(3,4)])
t.test(Year12, Year34, var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  Year12 and Year34
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean of x mean of y 
##  2.070423  1.306306

result = t.test(Year12, Year34, var.equal=FALSE)
result$p.value

## [1] 4.009356e-05

Using Welch’s Two Sample t-test, the mean of early classes taken by students in their first 2 years is 2.07 and the mean of early classes taken by other students is 1.31. The P value is far less than 0.05, so the difference is considered very significant.

Q3 Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

larks <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Lark"])
owls <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Owl"])
t.test(larks, owls, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  larks and owls
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
##   mean of x   mean of y 
##  0.09024390 -0.03836735

result = t.test(larks, owls, var.equal = FALSE)
result$p.value

## [1] 0.4229482

Using Welch’s Two Sample t-test, the mean cognition Z score of students who identify as larks is 0.09, while the mean cognition Z score of students who identify as owls is -0.04. The P value is greater than 0.05, so the difference is not significant.

Q4 Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

Early = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 1])
NotEarly = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 0])
t.test(Early,NotEarly, val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  Early and NotEarly
## t = -1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.5412830  0.2233558
## sample estimates:
## mean of x mean of y 
##  1.988095  2.647059

result = t.test(Early,NotEarly, val.equal=FALSE)
result$p.value

## [1] 0.1421377

Using Welch’s Two Sample t-test, the mean number of classes missed for students with an early class is 1.99, and the mean number of classes missed for students with no early classes is 2.65. The p value is greater than 0.05, so the difference is not significant.

Q5 Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

high = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus %in% c("moderate","severe")])
low = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus == "normal"])
t.test(high,low, val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  high and low
## t = -5.6339, df = 55.594, p-value = 6.057e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean of x mean of y 
##  21.61364  27.05742

result = t.test(high,low, val.equal=FALSE)
result$p.value

## [1] 6.056559e-07

Welch’s Two Sample t-test shows that the mean happiness score of students with moderate or severe depression is 21.61 and the mean happiness score of students with normal levels of depression is 27.06. The p test score is below 0.05, showing a significantly lower happiness score for students with moderate or severe depression.

Q6 Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

AllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 1])
NoAllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 0])
t.test(AllNighter,NoAllNighter,val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  AllNighter and NoAllNighter
## t = 1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1608449  1.9456958
## sample estimates:
## mean of x mean of y 
##  7.029412  6.136986

result = t.test(AllNighter,NoAllNighter,val.equal=FALSE)
result$p.value

## [1] 0.09478991

Welch’s Two Sample t-test shows that the mean sleep score for students who reported to having at least one all nighter was 7.03 and the mean sleep score for students who hadn’t is 6.14. This shows that while students who had not pulled an all nighter had a better sleep score, the difference was not significant.

Q7 Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

NoAlch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Abstain"])
Alch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Heavy"])
t.test(NoAlch,Alch,val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  NoAlch and Alch
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean of x mean of y 
##  8.970588 10.437500

result = t.test(NoAlch,Alch,val.equal=FALSE)
result$p.value

## [1] 0.5362324

Welch’s Two Sample t-test shows that the mean stress score for students who abstain from alcohol is 8.97 and the mean stress score for heavy drinkers is 10.44. While this shows the stress score for heavy drinkers is higher, the p score is greater than 0.05, showing the difference is not significant.

Q8 Is there a significant difference in the average number of drinks per week between students of different genders?

male = c(SleepStudyData$Drinks[SleepStudyData$Gender == 1])
female = c(SleepStudyData$Drinks[SleepStudyData$Gender == 0])
t.test(male,female,val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  male and female
## t = 6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.241601 4.360009
## sample estimates:
## mean of x mean of y 
##  7.539216  4.238411

result = t.test(male,female,val.equal=FALSE)
result$p.value

## [1] 7.001743e-09

Welch’s Two Sample t-test shows that the mean number of drinks per week for male students is 7.54 and the mean for females is 4.24. The p value is well below 0.05, showing a very significant difference in the number of drinks had per week between males and females.

Q9 Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

HStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "high"])
NStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "normal"])
t.test(HStress,NStress,val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  HStress and NStress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
## mean of x mean of y 
##  24.71500  24.88543

result = t.test(HStress,NStress,val.equal=FALSE)
result$p.value

## [1] 0.2855177

Welch’s Two Sample t-test shows that the mean bed time for students with high stress is 24.72 (about 12:40am) and the mean bed time for students with normal stress is 24.89 (about 12:50am). Although students with high stress go to bed slightly earlier on weekdays, the p value of greater than 0.05 shows that the difference is not significant.

Q10 Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

First2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(1,2)])
Last2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(3,4)])
t.test(First2,Last2,val.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  First2 and Last2
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean of x mean of y 
##  8.213592  8.221892

result = t.test(First2,Last2,val.equal=FALSE)
result$p.value

## [1] 0.9618461

Welch’s Two Sample t-test shows that the mean hours of sleep a student gets on the weekends in their first 2 years is 8.21 hours, while students in their last 2 years get a mean of 8.22 hours of sleep on the weekends. The p value of well over 0.05 shows that there is a very insignificant difference in the amount of sleep students get on the weekends when divided by class year.

Summary

In conclusion, methods learned in the class material proved very useful in comparing different sets of data. This project helped me practice learned concepts and utilize problem solving skills. This project also helped me become more familiar with the functionality of posit.

Appendix

knitr::opts_chunk$set(echo = TRUE)
SleepStudyData = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(SleepStudyData)
male = c(SleepStudyData$GPA[SleepStudyData$Gender == 1])
female = c(SleepStudyData$GPA[SleepStudyData$Gender == 0])
t.test(male, female, var.equal = FALSE)
result = t.test(male, female, var.equal = FALSE)
result$p.value
Year12 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(1,2)])
Year34 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(3,4)])
t.test(Year12, Year34, var.equal=FALSE)
result = t.test(Year12, Year34, var.equal=FALSE)
result$p.value
larks <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Lark"])
owls <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Owl"])
t.test(larks, owls, var.equal = FALSE)
result = t.test(larks, owls, var.equal = FALSE)
result$p.value
Early = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 1])
NotEarly = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 0])
t.test(Early,NotEarly, val.equal=FALSE)
result = t.test(Early,NotEarly, val.equal=FALSE)
result$p.value
high = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus %in% c("moderate","severe")])
low = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus == "normal"])
t.test(high,low, val.equal=FALSE)
result = t.test(high,low, val.equal=FALSE)
result$p.value
AllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 1])
NoAllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 0])
t.test(AllNighter,NoAllNighter,val.equal=FALSE)
result = t.test(AllNighter,NoAllNighter,val.equal=FALSE)
result$p.value
NoAlch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Abstain"])
Alch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Heavy"])
t.test(NoAlch,Alch,val.equal=FALSE)
result = t.test(NoAlch,Alch,val.equal=FALSE)
result$p.value
male = c(SleepStudyData$Drinks[SleepStudyData$Gender == 1])
female = c(SleepStudyData$Drinks[SleepStudyData$Gender == 0])
t.test(male,female,val.equal=FALSE)
result = t.test(male,female,val.equal=FALSE)
result$p.value
HStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "high"])
NStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "normal"])
t.test(HStress,NStress,val.equal=FALSE)
result = t.test(HStress,NStress,val.equal=FALSE)
result$p.value
First2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(1,2)])
Last2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(3,4)])
t.test(First2,Last2,val.equal=FALSE)
result = t.test(First2,Last2,val.equal=FALSE)
result$p.value