Introduction

For this project I will be exploring the 10 questions provided for us on D2L by analyzing the questions using a t-test. They are:

  1. Is there a significant difference in the average GPA between male and female college students?

  2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

  3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

  4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

  5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

  6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

  7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

  8. Is there a significant difference in the average number of drinks per week between students of different genders?

  9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

  10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Data

All of the data for this project was sourced from the website: https://www.lock5stat.com/datapage3e.html. The data is under the name SleepStudy, and the analysis will be done using the excel sheet downloaded from that website. Here is a list of variables and descriptions from the website: - Gender: 1 = male, 0 = female
- ClassYear: Year in school, 1 = first year, …, 4 = senior
- LarkOwl: Early riser or night owl? Lark, Neither, or Owl
- NumEarlyClass: Number of classes per week before 9 am
- EarlyClass: Indicator for any early classes
- GPA: Grade point average (0-4 scale)
- ClassesMissed: Number of classes missed in a semester
- CognitionZscore: Z-score on a test of cognitive skills
- PoorSleepQuality: Measure of sleep quality (higher values = poorer sleep)
- DepressionScore: Measure of degree of depression
- AnxietyScore: Measure of amount of anxiety
- StressScore: Measure of amount of stress
- DepressionStatus: Coded depression score: normal, moderate, or severe
- AnxietyStatus: Coded anxiety score: normal, moderate, or severe
- Stress: Coded stress score: normal or high
- DASScore: Combined score for depression, anxiety, and stress
- Happiness: Measure of degree of happiness
- AlcoholUse: Self-reported: Abstain, Light, Moderate, or Heavy
- Drinks: Number of alcoholic drinks per week
- WeekdayBed: Average weekday bedtime (24.0 = midnight)
- WeekdayRise: Average weekday rise time (8.0 = 8 am)
- WeekdaySleep: Average hours of sleep on weekdays
- WeekendBed: Average weekend bedtime (24.0 = midnight)
- WeekendRise: Average weekend rise time (8.0 = 8 am)
- WeekendSleep: Average weekend bedtime (24.0 = midnight)
- AverageSleep: Average hours of sleep for all days
- AllNighter: Had an all-nighter this semester? 1 = yes, 0 = no

Analysis

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Question 1:

## 
##  Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9962, df = 251, p-value = 8.465e-05
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.1020292 0.3003212
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

We reject the null hypothesis. The p-value is much smaller than the 0.05 threshold for statistical significance, so we can confidently conclude that a difference in the average GPA between males and females, with females having a higher GPA on average. The difference between GPAs of males and females is between 0.102 and 0.300.

Question 2

## 
##  Two Sample t-test
## 
## data:  NumEarlyClass by ClassYearGroup
## t = 4.0419, df = 251, p-value = 7.056e-05
## alternative hypothesis: true difference in means between group FirstTwoYears and group UpperYears is not equal to 0
## 95 percent confidence interval:
##  0.391789 1.136443
## sample estimates:
## mean in group FirstTwoYears    mean in group UpperYears 
##                    2.070423                    1.306306

We reject the null hypothesis. The p-value is much smaller than the 0.05 threshold, so the difference in early classes between underclassmen and upperclassmen is statistically significant. Both values of the confidence interval are positive, so that also helps us conclude that underclassmen will have more early classes on average.

Question 3:

## 
##  Two Sample t-test
## 
## data:  CognitionZscore by LarkOwlGroup
## t = 0.82293, df = 88, p-value = 0.4128
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1819703  0.4391928
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

We do not reject the null hypothesis. The p-value is greater than the 0.05 threshold, so there is no statistically significant difference between larks and owls with cognative skills. Also, part of the confidence interval is negative while the other is positive, meaning we cannot conclude there is a difference in the average cognative skills between the two groups.

Question 4

## 
##  Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.5319, df = 251, p-value = 0.1268
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.1882095  1.5061367
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

We do not reject the null hypothesis. The p-value is greater than the 0.05 threshold, so there is no statistically significant difference in classes missed for those with at least one early class and those who have none. Also, part of the confidence interval is negative, and the other is positive, meaning we cannot conclude if there is a difference in the average missed classes between the two groups.

Question 5

## 
##  Two Sample t-test
## 
## data:  Happiness by DepressionGroup
## t = -6.4426, df = 251, p-value = 5.954e-10
## alternative hypothesis: true difference in means between group Moderate/Severe and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.107907 -3.779653
## sample estimates:
## mean in group Moderate/Severe          mean in group Normal 
##                      21.61364                      27.05742

We reject the null hypothesis. The p-value is much lower than the 0.05 threshold, so there is a statistically significant difference between the happiness level in students whose depression rating is moderate or severe and those that have a normal rating. Both confidence interval numbers are negative, which also supports the conclusion that there is a difference. We can conclude students with normal depression ratings have higher happiness levels.

Question 6

## 
##  Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.664, df = 251, p-value = 0.09737
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9486940  0.1638431
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412

We do not reject the null hypothesis. The p-value is slightly higher than the 0.05 threshold, so there is no statistically significant difference between sleep quality for students who did have an all-nighter and those who did not. Also, one of the bounds on the confidence interval is negative, and the other is positive, which also means we cannot conclude there is a difference.

Question 7

## 
##  Two Sample t-test
## 
## data:  StressScore by AlcoholUseGroup
## t = -0.63251, df = 48, p-value = 0.5301
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.129928  3.196104
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500

We do not reject the null hypothesis. The p-value is much higher than the 0.05 threshold, so there is no statistically significant difference in stress score for those who report heavy alcohol use and those who abstain from it. Also, one bound of the confidence interval is negative and the other is positive, which also means we cannot conclude there is a difference.

Question 8

## 
##  Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.8358, df = 251, p-value = 6.16e-11
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.251794 -2.349816
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

We reject the null hypothesis. The p-value is much lower than the 0.05 threshold, which means there is a sigificant difference between the average number of drinks per week between males and females. Also, both bounds of the confidence interval are positive, which also concludes there is a difference. Men on average drink more alcohol per week than women.

Question 9

## 
##  Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0891, df = 251, p-value = 0.2771
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4786176  0.1377546
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543

We do not reject the null hypothesis. The p-value is higher than the 0.05 threshold, which means there is not a significant difference in weekday bedtimes for students with high stress and students with normal stress. The confidence interval also has one positive and one negative bound, which further supports that we can not conclude there is a difference.

Question 10

## 
##  Two Sample t-test
## 
## data:  WeekendSleep by ClassYearGroup
## t = -0.047839, df = 251, p-value = 0.9619
## alternative hypothesis: true difference in means between group FirstTwoYears and group UpperYears is not equal to 0
## 95 percent confidence interval:
##  -0.3500149  0.3334142
## sample estimates:
## mean in group FirstTwoYears    mean in group UpperYears 
##                    8.213592                    8.221892

We do not reject the null hypothesis. The p-value is higher than the 0.05 threshold, which means there is not a significant difference in weekend hours of sleep for underclassmen and upperclassmen. Also, the confidence interval has one positive and one negative bound, which further supports that we cannot conclude there is a difference.

Summary

In this project I used a t-test to answer if there is a significant difference between two variables, which occasionally involved a third connecting variable. A t-test has two main outputs that are useful for determining if there is a significant difference. First is the p-value. In order for there to be a significant difference and reject the null hypothesis, the p-value has to be less than 0.05. The other output is the confidence interval. This interval contains all the possibilities for the difference between the first average value and the second average value, with 95 percent confidence. If both values have the same sign, both positive or both negative, there is a difference between the average values. If one sign is negative and the other is positive, there is no conclusive significant difference.

Appendix

# Question 1
t.test(GPA ~ Gender, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 2
# Create a new group for ClassYear (first two years vs. others)
sleep_data$ClassYearGroup <- ifelse(sleep_data$ClassYear %in% c(1, 2), "FirstTwoYears", "UpperYears")

# Perform the t-test
t.test(NumEarlyClass ~ ClassYearGroup, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 3
# Create a group variable for LarkOwl
sleep_data$LarkOwlGroup <- ifelse(sleep_data$LarkOwl %in% c("Lark", "Owl"), sleep_data$LarkOwl, NA)

# Perform the t-test
t.test(CognitionZscore ~ LarkOwlGroup, data = sleep_data, alternative = "two.sided", var.equal = TRUE, na.action = na.omit)

# Question 4
# Perform the t-test directly since EarlyClass is binary
t.test(ClassesMissed ~ EarlyClass, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 5
# Create a group variable for DepressionStatus
sleep_data$DepressionGroup <- ifelse(sleep_data$DepressionStatus == "normal", "Normal", "Moderate/Severe")

# Perform the t-test
t.test(Happiness ~ DepressionGroup, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 6
# Perform the t-test directly since AllNighter is binary
t.test(PoorSleepQuality ~ AllNighter, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 7
# Create a group variable for AlcoholUse
sleep_data$AlcoholUseGroup <- ifelse(sleep_data$AlcoholUse %in% c("Abstain", "Heavy"), sleep_data$AlcoholUse, NA)

# Perform the t-test
t.test(StressScore ~ AlcoholUseGroup, data = sleep_data, alternative = "two.sided", var.equal = TRUE, na.action = na.omit)

# Question 8
# Perform the t-test directly since Gender is binary
t.test(Drinks ~ Gender, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 9
# Perform the t-test directly since Stress is already categorized
t.test(WeekdayBed ~ Stress, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

# Question 10
# Perform the t-test using the ClassYearGroup created earlier
t.test(WeekendSleep ~ ClassYearGroup, data = sleep_data, alternative = "two.sided", var.equal = TRUE)

References

All of the data for this report was found from this website: https://www.lock5stat.com/datapage3e.html