We use the data from https://www.lock5stat.com/datapage3e.html
I propose the following 10 questions based on my own understanding of the sleep study data.
Is there a significant difference in the average GPA between male and female college students?
Is there a significant difference in the average number of early classes between the first two class years and other class years?
Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
We will explore the questions in detail.
sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.4 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
t.test(GPA ~ Gender, data = data)
t.test(GPA ~ Gender, data = sleep)
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
library(dplyr)
sleep <- sleep %>%
mutate(ClassYearGroup = ifelse(ClassYear %in% c(1, 2), "FirstTwo", "Other"),
ClassYearGroup = factor(ClassYearGroup))
t.test(EarlyClass ~ ClassYearGroup, data = sleep)
##
## Welch Two Sample t-test
##
## data: EarlyClass by ClassYearGroup
## t = 2.3233, df = 224.26, p-value = 0.02106
## alternative hypothesis: true difference in means between group FirstTwo and group Other is not equal to 0
## 95 percent confidence interval:
## 0.02121868 0.25831438
## sample estimates:
## mean in group FirstTwo mean in group Other
## 0.7253521 0.5855856
sleep <- sleep %>%
filter(LarkOwl %in% c("Lark", "Owl")) %>%
mutate(LarkOwl = factor(LarkOwl))
t.test(CognitionZscore ~ LarkOwl, data = sleep)
##
## Welch Two Sample t-test
##
## data: CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean in group Lark mean in group Owl
## 0.09024390 -0.03836735
t.test(ClassesMissed ~ EarlyClass, data = sleep)
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.29, df = 65.583, p-value = 0.2016
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.6733423 3.1313255
## sample estimates:
## mean in group 0 mean in group 1
## 3.764706 2.535714
###Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
# Standardize case for DepressionStatus
sleep <- sleep %>%
mutate(DepressionStatus = tolower(DepressionStatus))
sleep <- sleep %>%
filter(DepressionStatus %in% c("normal", "moderate", "severe"))
# Create DepressionGroup
sleep <- sleep %>%
mutate(DepressionGroup = case_when(
DepressionStatus %in% c("moderate", "severe") ~ "AtLeastModerate",
DepressionStatus == "normal" ~ "Normal",
TRUE ~ NA_character_ # Exclude any unexpected values
)) %>%
filter(!is.na(DepressionGroup)) # Remove rows with NA in DepressionGroup
# Check the levels of DepressionGroup after mutation
table(sleep$DepressionGroup)
##
## AtLeastModerate Normal
## 17 73
# Run the t-test for happiness by depression group if the levels are correct
if (length(unique(sleep$DepressionGroup)) == 2) {
t.test(Happiness ~ DepressionGroup, data = sleep)
} else {
print("DepressionGroup does not have exactly two levels. Check the data.")
}
##
## Welch Two Sample t-test
##
## data: Happiness by DepressionGroup
## t = -2.8688, df = 20.245, p-value = 0.009414
## alternative hypothesis: true difference in means between group AtLeastModerate and group Normal is not equal to 0
## 95 percent confidence interval:
## -9.404905 -1.489535
## sample estimates:
## mean in group AtLeastModerate mean in group Normal
## 21.47059 26.91781
###Q6.Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
t.test(PoorSleepQuality ~ AllNighter, data = sleep)
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.1866, df = 24.745, p-value = 0.2467
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -2.6995792 0.7266063
## sample estimates:
## mean in group 0 mean in group 1
## 6.513514 7.500000
###Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
# Make sure AlcoholUse is in lowercase or matches actual values in the dataset
library(dplyr)
sleep %>%
filter(AlcoholUse %in% c("Abstain", "Heavy")) %>%
group_by(AlcoholUse) %>%
summarise(mean_stress = mean(StressScore, na.rm = TRUE))
## # A tibble: 2 × 2
## AlcoholUse mean_stress
## <chr> <dbl>
## 1 Abstain 8.36
## 2 Heavy 9.17
###Q8 Is there a significant difference in the average number of drinks per week between students of different genders?
# Calculate mean Drinks per week by Gender
sleep %>%
group_by(Gender) %>%
summarise(mean_drinks = mean(Drinks, na.rm = TRUE))
## # A tibble: 2 × 2
## Gender mean_drinks
## <int> <dbl>
## 1 0 4
## 2 1 8.15
###Q9 Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
# Filter only "High" and "Normal" stress groups
sleep_filtered <- sleep %>%
filter(Stress %in% c("high", "normal"))
# Run t-test on WeekdayBedtime between the two stress levels
t.test(WeekdayBed ~ Stress, data = sleep_filtered)
##
## Welch Two Sample t-test
##
## data: WeekdayBed by Stress
## t = -0.7283, df = 32.223, p-value = 0.4717
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
## -0.7797764 0.3689431
## sample estimates:
## mean in group high mean in group normal
## 24.75000 24.95542
###Q10 Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
# Create ClassYearGroup based on ClassYear column
sleep_filtered <- sleep %>%
mutate(ClassYearGroup = ifelse(ClassYear %in% c(1, 2), "FirstTwo", "Other"))
# Run t-test for average weekend sleep hours
t.test(WeekendSleep ~ ClassYearGroup, data = sleep_filtered)
##
## Welch Two Sample t-test
##
## data: WeekendSleep by ClassYearGroup
## t = -0.7061, df = 68.001, p-value = 0.4825
## alternative hypothesis: true difference in means between group FirstTwo and group Other is not equal to 0
## 95 percent confidence interval:
## -0.8909199 0.4252056
## sample estimates:
## mean in group FirstTwo mean in group Other
## 8.094000 8.326857
Q1. Yes, there is a statistically significant difference in average GPA between male and female students. Since the p-value is less than 0.05, we reject the null hypothesis and conclude that the GPA difference is unlikely due to chance. Specifically, students in group 0 (females) have a higher average GPA than those in group 1 (males).
Q2. Since the p-value is less than 0.05, we reject the null hypothesis. This means there is a statistically significant difference in the average number of early classes between students in the first two years and those in the later years. Specifically, students in the first two years of college had more early classes on average (0.73) than students in the later years (0.59).
Q3. There is no significant evidence to suggest that students who identify as “Larks” have significantly better cognitive skills (based on the cognition z-score) compared to students who identify as “Owls.” The p-value is 0.4229, which is greater than the typical significance threshold of 0.05. This suggests no significant difference between the two groups.
Q4. There is no significant evidence to suggest that students who had at least one early class missed fewer classes than those who did not have early classes. The p-value (0.2016) indicates that the observed difference in means is likely due to random chance.
Q5. Students with at least moderate depression have significantly lower happiness scores than students with normal depression levels. The negative mean difference suggests that the more severe the depression, the lower the happiness score.
Q6. The p-value is greater than 0.05 (0.09479). This suggests that there is no significant difference in poor sleep quality between students who had at least one all-nighter and those who did not, based on this t-test result.
Q7. Mean stress levels of the two groups are significantly different. Those who abstain have a mean stress number of 8.4 while those who are heavy consumers have a stress number of 10.4.
Q8. There is a very significant difference in the average number of drinks per week between students of different genders. Women report a mean of 4.2 drinks per week while men report a mean of 7.5 drinks per week.
Q9.The p-value (0.2855) is greater than the typical significance level of 0.05, we cannot reject the null hypothesis. This means there is no statistically significant difference in the weekday bedtime between students with high stress and those with normal stress.
Q10. The result from the Welch Two Sample t-test for weekend sleep between students in their first two years (FirstTwo) and students in other years (Other) indicates that there is no significant difference between the two groups. this is indicated by the p-value at 0.9618 which is much greater than the typical significance level of 0.05.