Introduction

In this report, I will be exploring the data set titled “SleepStudy” which was acquired from https://lock5stat.com/datapage3e.html (Lock5). This exploration was achieved through the use of the R markdown language and the RStudio interactive development environment (IDE), the dplyr library to allow the data set to be “piped” for each calculation, leaving the original data set unchanged when filtering or recoding specific columns for clarity of the output, and through the occassional use of the large-language model (LLM) ChatGPT, version 4o, to assist with the CSS formatting in the appendix.

The objective of this analysis is to determine if there are statistically significant differences between sample groups of college students in various metrics of academic and personal health, including GPA, cognitive performance, mental health, and sleep quality. The questions that will be used to analyze the data set are as follows:

  1. Is there a significant difference in the average GPA between male and female college students?

  2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

  3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

  4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t?

  5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

  6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t?

  7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

  8. Is there a significant difference in the average number of drinks per week between students of different genders?

  9. Is there a significant difference in the average weekday bedtime between students with high and low stress?

  10. Is there a significant difference in the average hours of sleep on weekends between first and second year students and other students?

Data

The data set, SleepStudy, was sourced by Lock5 from the authors of “Class Start Times, Sleep, and Academic Performance in College: A Path Analysis,” April 2012 and from a sample of students who underwent a series of skills tests to measure cognitive function and recorded a sleep diary over the course of a two-week period, per the provided reference paper from Lock5 (https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf). This data set includes 253 observations on 27 variables relating to sleep, mental health, alcohol use, and academic metrics such as GPA, class enrollment, and missed classes. The demographic data is not specified or clarified beyond what is included within the data set itself – gender and how many years enrolled in college.

Analysis

This section will contain the calculations used to determine whether or not there was statistical significance between samples. These calculations were performed with Welch 2-sample t-tests (t-test), whose outputs I’ve opted to include for easier data referencing for the reader. Within these outputs, the calculated observed difference of the estimated means of the two samples is included, as well as the calculated margin of error which was derived from inverting the relationship of the t-statistic and the Standard Error (SE), and multiplying that by the critical t-value (t*). The t* was derived from the quantile function with a significance level of .05 (α = 0.05), giving us the function: qt(1-α/2, df), where df is the degrees of freedom calculated from the Welch-Satterthwaite equation. During this analysis, the commonly accepted thresholds for t-statistic (2.5 SE’s) and degrees of freedom (DF, 30 & 100) will be used, and all confidence intervals will be a 95% confidence interval (CI) due to our α being equal to .05. Confidence interval values will be approximated to four or five significant figures as appropriate. When analyzing the calculated p-values (p expressed in scientific notation), the expressions p > α and p < α will be used or implied.

Question 1

Is there a significant difference in the average GPA between male and female college students?
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725 
## 
## The observed difference is: 0.2012 
## The Margin of Error is: 0.1014

In the comparison of GPAs between male and female students, the t-statistic – or difference of sample means – was calculated to be 3.9193 SEs apart, which is considerably larger than the commonly accepted 2.500 factor and indicates that the difference is larger than would be expected as a result of random chance; This is the first indicator that our results are statistically significant. A DF > 100 indicates that the difference in using this specific t-distribution vs. the normal distribution (or z-distribution) for calculating our t* and p would be nearly indistinguishable. The p calculated from the t-test was 1.243e-4, considerably lower than our p = α threshold. As p < α, there is now high confidence that the null hypothesis can be rejected and our results are statistically significant. With such a large t-statistic and small p, in conjunction with our CI being (0.09982, 0.30253), the null hypothesis can be fully rejected and it can be concluded that the observed difference female students having a higher GPA by 0.2012 GPA points on average than male students is a statistically significant difference.

Question 2

Is there a significant difference in the average number of early classes between the first two class years and other class years?
## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by YearGroups
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group First & Second Year and group Third & Fourth Year is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group First & Second Year mean in group Third & Fourth Year 
##                          2.070423                          1.306306 
## 
## The observed difference is: 0.7641 
## The Margin of Error is: 0.3599

“NumEarlyClass” was a variable in the data set representing the number of “early classes” that a student was enrolled in. An early class was defined by Lock5 as being a class with a start time prior to 9:00 am (or 0900 in military schema) in a given semester. In the analysis of how many early classes students were enrolled in, the sample was split into two groups: First & Second year students (colloquially: underclassmen) and Third & Fourth year students (colloquially: upperclassmen). The t-statistic was calculated to be 4.1813 SEs apart between the two groups, giving the first indication of strong confidence of a statistically significant difference. In conjunction with a p of 4.009e-5 – considerably smaller than α – and a CI of (0.4042, 1.1240). It can again be seen that 0 ∉ the set, which solidifies the determination that the null hypothesis can be rejected, and the observed difference of underclassmen having a 0.7641 higher average enrollment rate, or approximately one early class per semester, is statistically significant.

Question 3

Do students who identify as “Larks” have significantly better cognitive skills (cognition z-score) compared to “Owls”?
## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735 
## 
## The observed difference is: 0.1286 
## The Margin of Error is: 0.318

This question nets results that break what appeared to be a trend in our observations. Students could self-identify as “Larks,” “Owls,” or “Neither.” A Lark was defined to be a student who was an “early riser,” with no indication given as to what constituted early in this context, while an Owl was someone who identified as a night owl – a person who stays awake exceptionally late into the evening. Students who identified as neither of these categories were excluded from the analysis. The first indicator analyzed was the t-statistic, which was calculated to be less than one standard error of difference between sample means. This is an early indicator that the findings are not outside the range of what can be considered signal noise or random chance. The p was calculated to be 4.229e-1, meaning that p > α and indicating extremely low confidence that the null hypothesis can be rejected this time. The final determining factor was the CI, (-0.1894, 0.4466), which is the first CI in the analysis of the data to include 0 as an element of the set. All of these culminate in the conclusion that the observed difference between the sample means is not statistically significant, and we cannot conclude that there is a significant relationship between when an individual prefers to exercise consciousness and their cognitive skills. This can be further reinforced by the observed difference being smaller than the margin of error by a factor of approximately 2.47.

Question 4

Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class and those who didn’t?
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095 
## 
## The observed difference is: 0.659 
## The Margin of Error is: 0.8823

This question aims to evaluate whether or not enrolling in an early class would make a student more likely to miss more classes than students who did not. In my experience, most students choose classes that fit their natural circadian rhythm, when possible, which would naturally skew the results of this analysis low; this is a purely anecdotal observation. The t-test resulted in a t-statistic of 1.4755 standard errors, which is a bit low when determining confidence regarding the statistical significance of the difference. The p was calculated to be 1.421e-1, which is larger than the α by nearly a factor of 3. 0 is once again an element of the CI: (-0.2234, 1.5413). Given the relatively low t-statistic, a p that is considerably larger than the α, and the CI having 0 as an element, it can be determined that the null hypothesis cannot be rejected, and our observed difference in missed classes of 0.659 is not a statistically significant difference when evaluated against enrollment status in early classes. It can once again be seen that the observed difference is smaller than the calculated margin of error, reinforcing the determination of a lack of significance.

Question 5

Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionStat
## t = -5.6339, df = 55.594, p-value = 6.057e-07
## alternative hypothesis: true difference in means between group Elevated and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean in group Elevated   mean in group Normal 
##               21.61364               27.05742 
## 
## The observed difference is: -5.4438 
## The Margin of Error is: 1.9359

Question 5 is one of those questions that is asked, and often met with incredulousness by the general public. Of course there would be a significant difference in happiness, what a silly question. But what is often lost on the lay-person is that a) a statistical significance is data driven and has meaning, and b) such questions must be asked to empirically determine and define the difference so that further research has a foundational layer upon which to build. I digress.

When evaluating the data set for this question, the determined samples were to be those who received a depression score in the “normal” range, and those who received a depression score that was at in the moderate or severe range. With a t-statistic of -5.6339, meaning that the first sample of individuals with an elevated depression status had an average happiness score that was 5.6339 SEs lower than the second sample, those with a normal depression score, there is already strong evidence and confidence in the difference between sample means and the null hypothesis is likely to be rejected. Combined with a p of 6.057e-7, it can almost be determined that the difference is statistically significant without having to see the CI. Given that the CI is (-7.3797, -3.5078) and thus does not include 0 as an element, the observed difference in happiness of -5.4438 points is a statistically significant difference.

Question 6

Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter and those who didn’t?
## 
##  Welch Two Sample t-test
## 
## data:  AverageSleep by AllNighters
## t = 4.4256, df = 42.171, p-value = 6.666e-05
## alternative hypothesis: true difference in means between group No all nighters and group One or more All-nighters is not equal to 0
## 95 percent confidence interval:
##  0.4366603 1.1685667
## sample estimates:
##          mean in group No all nighters mean in group One or more All-nighters 
##                               8.073790                               7.271176 
## 
## The observed difference is: 0.8026 
## The Margin of Error is: 0.366

For this analysis, students who were in the “all-nighter” sample needed to have at least one reported all-nighter in the semester, but did not differentiate further. With a t-statistic of 4.4256 standard errors, and a p of 6.666e-5, there is very strong indication that the null hypothesis can be rejected and there is a statistically significant difference between the sample means. Given that the CI of (0.4367, 1.1686) does not have 0 as an element, and the observed difference is more than twice the margin of error, the null hypothesis can be rejected, and the observed difference of .8026 – or nearly one hour of sleep – is a statistically significant difference.

Question 7

Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500 
## 
## The observed difference is: -1.4669 
## The Margin of Error is: 4.7943

This question looks only at students who abstain, and those who report heavy alcohol use; those who report light or moderate alcohol use are not included in this analysis. Given that there are four separate categories for this variable, it was required that the data be filtered so that only the relevant samples were analyzed. It should be noted that this question is actually the first one in the overall analysis effort where the DF is under 30, meaning that is is very important to use the t-distribution in the calculations to ensure accuracy in the results.A t-statistic of -0.62604 indicates that while those who abstain were measured to have a lower average amount of stress than those who report heavy alcohol use, the difference between sample means was just under two-thirds of a standard error. With a p of 5.362e-1 – a full order of magnitude larger than the α – the null hypothesis cannot be rejected, and a CI of ( -6.2612, 3.3273) confirms that the observed difference, although nearly a 1.5 point lower overall stress measurement, does not relate to alcohol usage and is not statistically significant. It can once again be noted that the absolute value of the observed difference is considerably smaller than the margin of error: less than one-third, in fact.

Question 8

Is there a significant difference in the average number of drinks per week between students of different genders?
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Genders
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group Female   mean in group Male 
##             4.238411             7.539216 
## 
## The observed difference is: -3.3008 
## The Margin of Error is: 1.0592

Continuing to analyze the drinking habits of college students, this question looks to determine a statistically significant disparity between male and female students and their average consumption. a t-statistic of -6.1601 indicates a significant difference between samples, with female students being approximately 6.2 SEs lower than males. A p of 7.002e-9 indicates that if the null hypothesis were to be true, the likelihood of observing such an extreme difference is approximately 1 in 143 million. Finally, the CI being (-4.36001, -2.24160), with 0 again not being an element of the set, establishing a significantly high enough confidence that the observed difference of female students consuming more than 3 fewer alcoholic beverages per week on average compared to males is statistically significant.

Question 9

Is there a significant difference in the average weekday bedtime between students with high and low stress?
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543 
## 
## The observed difference is: -0.1704 
## The Margin of Error is: 0.3152

It is not a very well kept secret that stress can impact sleep. Students with a coded stress score of “normal” scored between 0 and 14 points, while students with a coded stress score of “high” scored 15+. Once again, a two-sample t-test was performed on the data set, and the t-statistic was calculated to be just -1.0746, giving low levels of confidence that the difference of means will be significant. The p was calculated to be 2.855e-1 – larger than α by a factor of 5.71. When these factors are combined with a 95% confidence interval of (), which is a rather small range and includes 0 as an element, it can be confidently determined that the null hypothesis cannot be rejected and the observed difference of just -0.1704 hours, or about 10 minutes, is not a statistically significant difference, and exists well within the margin of error.

Question 10

Is there a significant difference in the average hours of sleep on weekends between first and second year students and other students?
## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroups
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group First & Second Year and group Third & Fourth Year is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group First & Second Year mean in group Third & Fourth Year 
##                          8.213592                          8.221892 
## 
## The observed difference is: -0.0083 
## The Margin of Error is: 0.3415

The final question in this analysis is again relating to sleep. Is there a significant difference in the amount of sleep between underclassmen and upperclassmen? Not really. With the estimated sample means being approximately 8.21 hours and 8.22 hours respectively, the observed difference between the sample means is just -0.0083 hours, or about 30 seconds. The t-test could have resulted in deterministic values that pointed to a conclusion of those 30 seconds actually being statistically significant, and I would likely still argue to the contrary. Thankfully, the results of the t-test were a t-statistic of -0.047888, a p of 9.618e-1, and a CI of (-0.3498 0.3335), 0 again being an element of the set. Such results indicate that the null hypothesis cannot be rejected, and the observed difference actually affirms that the null hypothesis is functionally true in this instance. It can be confidently determined that the result is not statistically different.

Conclusion

As with all explorations of data and interpretive analysis of the results, conclusions do not always align with expectation. I was surprised to learn that upperclassmen get almost the exact same amount of sleep as underclassmen, despite having significantly more difficult coursework. I also expected to see a more significant difference in stress in regards to alcohol use. I am unsure if the specific questions asked in this exercise can be used as the basis for recommendation in regards to lifestyle changes, but they are excellent questions for demonstrating and exemplifying the use and importance of statistical inference.

Appendix

The following are the individual code chunks used in the setup of the markdown file, and to analyze each question. As can be seen, the “Global Output Function” was utilized in the code chunk of each question, and the use of the dplyr library significantly reduced the size of the required code, while ensuring integrity of the data set throughout the calculations.

```r Code Snippets
-------------------------------------------------------------------------------------------------------------------------------
Global Ouput Function

output <- function(result) {
  print(result) # Prints t.test() results from code chunk
  alpha <- 0.05
  t_val <- result$statistic # t-value (t-statistic): Difference of means relative to Standard Error
  M1 <- result$estimate[1] # Sample 1 estimated mean value
  M2 <- result$estimate[2] # Sample 2 estimated mean value
  df <- result$parameter # Degrees of freedom; Gives insight to how comparable our t-distribution is to the z-distribution
  SE <- (M1-M2)/t_val # Standard Error (SE) calculation
  t_critical <- qt(1-alpha/2, df) # Critical t-value (t*) calculation
  MoE <- t_critical * SE # Margin of Error Calculation
  ttRat <- t_val / t_critical # Ratio of t-statistic to critical t-value
  cat("The observed difference is:", round(M1-M2, 4), "\n")
  cat("The Margin of Error is:", round(MoE, 4), "\n")
}
-------------------------------------------------------------------------------------------------------------------------------
Question 1

result <- t.test(GPA ~ Gender, data = sleep_data)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 2

result <- sleep_data %>%
  mutate(YearGroups = ifelse(ClassYear %in% c(1,2),
                              "First & Second Year",
                              "Third & Fourth Year")) %>%
  t.test(NumEarlyClass ~ YearGroups, data = .) # . passes piped data into t.test()

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 3

result <- sleep_data %>%
  filter(LarkOwl %in% c("Lark", "Owl")) %>%
  t.test(CognitionZscore ~ LarkOwl, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 4

result <- t.test(ClassesMissed ~ EarlyClass, data = sleep_data)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 5

result <- sleep_data %>%
  mutate(DepressionStat = ifelse(DepressionStatus %in% c("normal"),
                          "Normal", "Elevated")) %>%
  t.test(Happiness ~ DepressionStat, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 6

result <- sleep_data %>%
  mutate(AllNighters = ifelse(AllNighter %in% c(1), 
                              "One or more All-nighters", 
                              "No all nighters")) %>%
  t.test(AverageSleep ~ AllNighters, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 7

result <- sleep_data %>%
  filter(AlcoholUse %in% c("Abstain","Heavy")) %>%
  t.test(StressScore ~ AlcoholUse, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 8

result <- sleep_data %>%
  mutate(Genders = ifelse(Gender %in% c(1),  "Male", "Female")) %>%
  t.test(Drinks ~ Genders, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 9

result <- sleep_data %>%
  filter(Stress %in% c("normal", "high")) %>%
  t.test(WeekdayBed ~ Stress, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
Question 10

result <- sleep_data %>%
  mutate(YearGroups = ifelse(ClassYear %in% c(1, 2),
                           "First & Second Year", "Third & Fourth Year")) %>%
  t.test(WeekendSleep ~ YearGroups, data = .)

output(result)
-------------------------------------------------------------------------------------------------------------------------------
```