Introduction

In this project, a data set containing various statistics regarding sleeping habits of college students will be analyzed. The data will be analyzed using a set of ten research questions. By exploring these questions, we strive to offer a thorough understanding of sleep patterns and their associated factors among college students, ultimately supporting their well-being and academic achievement.

These are the ten questions to be analyzed using the data:

  1. Is there a significant difference in the average GPA between male and female college students?

  2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

  3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

  4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t?

  5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

  6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t?

  7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

  8. Is there a significant difference in the average number of drinks per week between students of different genders?

  9. Is there a significant difference in the average weekday bedtime between students with high and low stress?

  10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Analysis

The first step is to access the data set. For convenience, I have removed all empty values from the initial, raw data set. This way, most chunks of code will not need to have empty values removed. The chunks of code can use the already “cleaned” data.

# call in data set
data <- read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")

# display first six rows
head(data)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Question 1: Is there a significant difference in the average GPA between male and female college students?

# perform t-test
t_test_1 <- t.test(GPA ~ Gender, data = data, alternative = "two.sided")

# display t-test results
print(t_test_1)
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

The results from the t-test yields a p-value that is less than 0.05. This indicates that a significant difference exists in the average GPA between male and female college students.

Question 2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

# define data group
data$ClassGroup <- ifelse(data$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# perform t-test
t_test_2 <- t.test(NumEarlyClass ~ ClassGroup, data = data)

# display t-test results
print(t_test_2)
## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by ClassGroup
## t = 4.1813, df = 250.69, p-value = 0.00004009
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    2.070423                    1.306306

The results from the t-test yields a p-value that is less than 0.05. This indicates that a significant difference exists in the average number of early classes between the first two class years and other class years.

Question 3: Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

# define data subset
data_subset <- subset(data, LarkOwl %in% c("Lark", "Owl"))

# perform t-test
t_test_3 <- t.test(CognitionZscore ~ LarkOwl, data = data_subset, alternative = "greater")

# display t-test results
print(t_test_3)
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.2115
## alternative hypothesis: true difference in means between group Lark and group Owl is greater than 0
## 95 percent confidence interval:
##  -0.1372184        Inf
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

The results from the t-test yields a p-value that is greater than 0.05. This indicates that no significant difference exists in cognitive skills (cognition z-score) between students who identify as “larks” and students who identify as “owls”.

Question 4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t?

# perform t-test
t_test_4 <- t.test(ClassesMissed ~ EarlyClass, data = data, alternative = "two.sided")

# display t-test results
print(t_test_4)
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

The results from the t-test yields a p-value that is greater than 0.05. This indicates that no significant difference exists in the average number of classes missed in a semester between students who had at least one early class and those who didn’t.

Question 5: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

# define data group
data$DepressionGroup <- ifelse(data$DepressionScore >= 10, "ModerateOrHigher", "Normal")

# perform t-test
t_test_5 <- t.test(Happiness ~ DepressionGroup, data = data)

# display t-test results
print(t_test_5)
## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionGroup
## t = -5.6339, df = 55.594, p-value = 0.0000006057
## alternative hypothesis: true difference in means between group ModerateOrHigher and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean in group ModerateOrHigher           mean in group Normal 
##                       21.61364                       27.05742

The results from the t-test yields a p-value that is less than 0.05. This indicates that a significant difference exists in the average happiness level between students with at least moderate depression and normal depression status.

Question 6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t?

# preform t-test
t_test_6 <- t.test(AverageSleep ~ AllNighter, data = data)

# display t-test results
print(t_test_6)
## 
##  Welch Two Sample t-test
## 
## data:  AverageSleep by AllNighter
## t = 4.4256, df = 42.171, p-value = 0.00006666
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.4366603 1.1685667
## sample estimates:
## mean in group 0 mean in group 1 
##        8.073790        7.271176

The results from the t-test yields a p-value that is less than 0.05. This indicates that a significant difference exists in the average sleep quality scores between students who reported having at least one all-nighter and those who didn’t.

Question 7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

# define data subset
data_subset <- subset(data, AlcoholUse %in% c("Abstain", "Heavy"))

# perform t-test
t_test_7 <- t.test(StressScore ~ AlcoholUse, data = data_subset, alternative = "less")

# display t-test results
print(t_test_7)
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.2681
## alternative hypothesis: true difference in means between group Abstain and group Heavy is less than 0
## 95 percent confidence interval:
##      -Inf 2.515654
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500

The results from the t-test yields a p-value that is greater than 0.05. This indicates that no significant difference exists in stress scores between students who abstain from alcohol use and students who report heavy alcohol use.

Question 8: Is there a significant difference in the average number of drinks per week between students of different genders?

# perform t-test
t_test_8 <- t.test(Drinks ~ Gender, data = data)

# display t-test results
print(t_test_8)
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 0.000000007002
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

The results from the t-test yields a p-value that is less than 0.05. This indicates that a significant difference exists in the average number of drinks per week between students of different genders.

Question 9: Is there a significant difference in the average weekday bedtime between students with high and low stress?

# define data group
data$StressGroup <- ifelse(data$StressScore >= 15, "HighStress", "LowStress")

# perform t-test
t_test_9 <- t.test(WeekdayBed ~ StressGroup, data = data, alternative = "two.sided")

# display t-test results
print(t_test_9)
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by StressGroup
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group HighStress and group LowStress is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
## mean in group HighStress  mean in group LowStress 
##                 24.71500                 24.88543

The results from the t-test yields a p-value that is greater than 0.05. This indicates that no significant difference exists in the average weekday bedtime between students with high stress and students with low stress.

Question 10: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

# define data group
data$YearGroup <- ifelse(data$ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears")

# perform t-test
t_test_10 <- t.test(WeekendSleep ~ YearGroup, data = data)

# display t-test results
print(t_test_10)
## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    8.213592                    8.221892

The results from the t-test yields a p-value that is greater than 0.05. This indicates that no significant difference exists in the average hours of sleep on weekends between the first two class years and other class years.