(1) Introduction

The purpose of this report is to analyze the SleepStudy dataset, which provides a comprehensive look into various sleep habits, academic performance, and lifestyle factors among college students. This dataset includes information such as sleep patterns (bedtimes and average sleep duration), academic metrics (GPA and classes missed), cognitive scores, mental health indicators, and alcohol consumption habits. By examining these variables, we aim to uncover relationships and trends that shed light on how different lifestyle choices and demographic factors influence academic and personal outcomes in college life.

In this analysis, statistical methods will be used to address key research questions. These questions focus on exploring differences in GPA based on gender, cognitive skills between “larks” and “owls,” academic engagement among students with early classes, and more. To help illustrate these findings, T-tests will be conducted in order to gain a better understanding in the data that is presented to us. A t-test is a statistical method used to determine if there is a significant difference between the means of two groups, helping to assess whether observed differences in sample data reflect true population differences or are simply due to random variation. The test begins with a null hypothesis, which assumes there is no difference between the group means, and an alternative hypothesis, which proposes that a difference does exist. The t-test calculates a t-statistic, which measures the size of the difference relative to the variation in the sample data—larger t-values indicate more substantial differences. Degrees of freedom (df), which depend on the sample sizes, impact the shape of the t-distribution used in the test. The resulting p-value indicates the probability of observing the data if the null hypothesis were true; a smaller p-value (typically less than 0.05) suggests a significant difference between groups. Additionally, the test provides a confidence interval for the difference in means; if this interval excludes zero, it further supports a significant difference. Mean estimates for each group also indicate the direction and magnitude of the difference. T-tests are versatile and widely used in fields like psychology, medicine, and education to test hypotheses and draw meaningful conclusions from data.

(2) Data Analysis/Methodology

We begin by examining the data collection process for the SleepStudy dataset. This dataset was sourced from https://www.lock5stat.com/datapage3e.html and contains a wide range of metrics focused on the sleep habits, academic performance, and lifestyle choices of college students. The sample includes variables related to students’ sleep patterns (such as average hours of sleep, bedtime, and frequency of all-nighters), academic metrics (including GPA and classes missed), cognitive and mental health indicators (such as cognition z-scores and depression scores), and lifestyle factors like alcohol use and exercise habits. This comprehensive dataset provides a diverse view of student life, allowing us to explore the relationships between sleep, academics, and well-being.

To analyze this dataset, we developed 10 research questions focusing on various aspects of student life. Each question was analyzed using R software, with t-tests applied to identify statistically significant differences between groups. Below is a summary of the methodology for each question:

1: Is there a significant difference in the average GPA between male and female college students? We use a t-test to compare the mean GPA of male and female students.

2: Is there a significant difference in the average number of early classes between first-year and second-year students versus those in later years? A t-test compares the number of early 3: classes taken by early-year students and those in advanced years.

3: Do students who identify as “larks” have significantly better cognitive skills compared to “owls”? To explore this, we perform a t-test on cognition z-scores between larks and owls.

4: Is there a significant difference in the average number of classes missed between students with at least one early class and those without? A t-test examines missed classes across students based on their early class status.

5: Is there a significant difference in average happiness between students with moderate or higher depression scores and those with normal scores? We analyze happiness scores with a t-test based on depression level.

6: Is there a significant difference in average sleep quality scores between students who have pulled at least one all-nighter and those who have not? We use a t-test to compare sleep quality between students who reported all-nighters and those who did not.

7: Do students who abstain from alcohol have better stress scores than those reporting heavy alcohol use? This question is addressed with a t-test comparing stress scores between non-drinkers and heavy drinkers.

8: Is there a significant difference in the average number of drinks per week between male and female students? We perform a t-test to compare weekly drink counts by gender.

9: Is there a significant difference in the average weekday bedtime between students with high and normal stress? A t-test assesses differences in weekday bedtime based on stress levels.

10: Is there a significant difference in the average hours of sleep on weekends between first-year and second-year students and those in later years? We use a t-test to compare weekend sleep hours between these groups.

Each question is analyzed with supporting visualizations, such as box plots and histograms, to illustrate the findings. These methods provide a comprehensive approach to understanding the data, allowing for meaningful insights into how college students’ academic, cognitive, and personal lives intersect.

(3) Data Analysis Results

## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

##   Gender  GPA
## 1      0 3.60
## 2      0 3.24
## 3      0 2.97
## 4      0 3.76
## 5      0 3.20
## 6      1 3.50

We begin our analysis with the first question: “Is there a significant difference in the average GPA between male and female college students?” To address this, we conducted a Welch Two Sample t-test. After cleaning the data to remove any blank or non-numeric entries, the t-test yielded a p-value of 0.0001243, indicating a statistically significant difference in average GPA between male (mean = 3.32) and female (mean = 3.12) students. This suggests that, on average, male students have a slightly higher GPA compared to female students in this sample. Although the difference is modest, the low p-value provides evidence that gender may be associated with GPA differences. A box plot of GPA by gender was also created to visually support this finding.

The box plot above displays the GPA distribution between male (coded as 0) and female (coded as 1) college students in the SleepStudy dataset. From the plot, we observe that the median GPA for male students is slightly higher than that for female students, consistent with the t-test results indicating a statistically significant difference between the groups. The interquartile ranges for both groups are similar, though males show a slightly higher upper quartile. Additionally, there is a mild outlier among male students with a GPA below 2.0, suggesting some variability within the group. Overall, while both genders display similar GPA ranges, the box plot visually supports the finding that male students, on average, have a marginally higher GPA than female students.

## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by YearGroup
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    2.070423                    1.306306

##   ClassYear EarlyClass
## 1         4          0
## 2         4          1
## 3         4          0
## 4         1          1
## 5         4          0
## 6         4          0

Our next question explores, If there is a significant difference in the average number of early classes between the first two class years and other class years? We analyzed this question using a T-test again that filters any blank data in order to receive accurate results. Based on the t-test results, we found a significant difference in the average number of early classes between students in the first two years and those in later years, with a p-value of 4.009e-05. The mean number of early classes for first- and second-year students is 2.07, while for students in later years, it is 1.31. The 95% confidence interval for the difference in means ranges from 0.40 to 1.12, indicating that students in their first two years tend to have more early classes on average than those in later years. This statistically significant difference suggests that early-year students are more likely to enroll in morning classes compared to their senior peers. We can once again, visualize our results by displaying a box plot that is shown below.

The box plot above illustrates the distribution of the number of early classes between students in their first two years and those in later years. Students in the first two years (labeled “First Two Years”) have a higher median number of early classes compared to students in later years (“Other Years”), which aligns with the t-test results indicating a statistically significant difference between the groups. The interquartile range (IQR) for first- and second-year students is larger, suggesting greater variability in the number of early classes for this group. Additionally, the overall range of early classes is higher for first- and second-year students, indicating that these students are more likely to enroll in a wider range of early classes than their upper-year peers. This visual representation supports the conclusion that early-year students take more early classes on average than those in later years.

## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.2115
## alternative hypothesis: true difference in means between group Lark and group Owl is greater than 0
## 95 percent confidence interval:
##  -0.1372184        Inf
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

##   LarkOwl CognitionZscore
## 1 Neither           -0.26
## 2 Neither            1.39
## 3     Owl            0.38
## 4    Lark            1.39
## 5     Owl            1.22
## 6 Neither           -0.04

We continue our analysis with our third question, “Do students who identify as”larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?” Viewing the above t-test data, indicates no statistically significant difference between the groups. The p-value of 0.2115 is well above the common significance threshold of 0.05, suggesting that any observed difference in cognitive scores between “larks” and “owls” could likely be due to chance.

The mean cognition z-score for “larks” is 0.0902, while for “owls,” it is -0.0384. Although “larks” have a slightly higher mean cognition score than “owls,” the 95% confidence interval for the difference in means (-0.1894 to 0.4466) includes zero, further indicating that there is no significant difference in cognitive skills between the two groups. To visually see this, we can use a histogram to compare the data and further solidify our results.

The histogram above displays the distribution of cognitive skills (measured by Cognition Z-scores) across three sleep types: “Lark,” “Neither,” and “Owl.” The distribution for each sleep type is overlaid to allow comparison. The “Neither” group, represented in green, has the highest concentration around a z-score of 0, indicating a more centered distribution around the average cognitive score. The “Owl” group, shown in blue, also centers around 0 but has a broader spread, indicating a wider variability in cognitive scores. The “Lark” group, displayed in red, has fewer observations overall and appears more spread out, though it also centers around 0.

Overall, there is no distinct pattern that suggests one group consistently has higher or lower cognitive scores than the others, aligning with the t-test results that showed no significant difference in average cognitive scores between “larks” and “owls.” This histogram visually reinforces that the cognitive scores for all three groups are fairly similar in distribution.

## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095

##   ClassesMissed EarlyClass
## 1             0          0
## 2             0          1
## 3            12          0
## 4             0          1
## 5             4          0
## 6             0          0

Question 4 explores the question, “Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class?” To address this, a Welch Two Sample t-test was conducted, comparing the average number of classes missed between students who had early classes (group 1) and those who did not (group 0). The t-test results show a p-value of 0.1421, which is above the significance threshold of 0.05, indicating that there is no statistically significant difference in the average number of classes missed between the two groups.

The mean number of classes missed for students without early classes (group 0) is 2.65, while for students with early classes (group 1), it is 1.99. The 95% confidence interval for the difference in means ranges from -0.22 to 1.54, which includes zero, further supporting the conclusion that any observed difference in classes missed between the groups is not statistically significant. This suggests that having at least one early class does not appear to impact the average number of classes missed in a semester. We can visualize the results by generating a box plot which is shown below and comment on the graph.

The box plot above shows the distribution of classes missed by students based on their early class status, where “0” indicates no early classes and “1” indicates at least one early class. Both groups have similar medians and interquartile ranges, suggesting that the average number of classes missed does not differ substantially between students with and without early classes. However, there are several outliers in both groups, with a few students missing significantly more classes than the majority. The spread of data is slightly wider for students without early classes (group 0), but overall, the visual similarities align with the t-test results, which showed no statistically significant difference between the groups in terms of classes missed. This implies that having an early class does not appear to impact the number of classes missed on average.

## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionStatus
## t = -5.6339, df = 55.594, p-value = 6.057e-07
## alternative hypothesis: true difference in means between group ModerateOrHigher and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean in group ModerateOrHigher           mean in group Normal 
##                       21.61364                       27.05742

##   Happiness DepressionStatus
## 1        28           normal
## 2        25           normal
## 3        17         moderate
## 4        32           normal
## 5        15           normal
## 6        22         moderate

Question 5 was “Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?” This question is important because it examines the relationship between mental health and happiness among college students, a group often facing high stress and mental health challenges. If students with moderate or higher depression levels report lower happiness than those with normal mental health, it underscores the impact of depression on well-being. Such insights can inform campus mental health initiatives, guiding resources and support services to improve life satisfaction for students struggling with depression and promote a healthier, more supportive academic environment. Viewing the above data, we see a significant difference in happiness levels between students with at least moderate depression and those with normal mental health. The t-test results show a p-value of 6.057e-07, indicating that the difference is statistically significant. Students with moderate or higher depression have a mean happiness score of 21.61, while those with normal depression status have a higher mean happiness score of 27.06. The 95% confidence interval for the difference in means ranges from -7.38 to -3.51, which does not include zero, reinforcing the significance of the difference. This suggests that higher levels of depression are associated with noticeably lower happiness among students in this sample. Lets visualize this data with a box plot of the data shown below.

The box plot above illustrates the difference in happiness scores between students with moderate or higher depression and those with normal depression status. Students with moderate or higher depression (in red) have a lower median happiness score, around 22, compared to students with normal depression status (in blue), whose median score is closer to 27. The interquartile range for the “Normal” group is slightly narrower, indicating less variability in their happiness scores. Additionally, there are several outliers in the “Normal” group with much lower happiness scores, though these do not affect the overall trend. This visual supports the t-test results, which showed a statistically significant difference in happiness levels between the two groups, with higher levels of depression associated with lower happiness scores.

## 
##  Welch Two Sample t-test
## 
## data:  AverageSleep by AllNighter
## t = 4.4256, df = 42.171, p-value = 6.666e-05
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.4366603 1.1685667
## sample estimates:
## mean in group 0 mean in group 1 
##        8.073790        7.271176

##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Question 6 explores whether there is a significant difference in average sleep duration between students who reported pulling at least one all-nighter and those who did not. The Welch Two Sample t-test results show a statistically significant difference, with a p-value of 6.666e-05. Students who did not pull an all-nighter (group 0) have a higher mean average sleep of 8.07 hours, compared to 7.27 hours for students who reported at least one all-nighter (group 1). The 95% confidence interval for the difference in means ranges from 0.44 to 1.17 hours, suggesting that students who avoid all-nighters generally get more sleep on average. This finding indicates that all-nighters may impact overall sleep duration, with students who pull all-nighters tending to have less average sleep. The resulting box plot for this question is also shown below.

The box plot above displays the average sleep duration for students who did not pull an all-nighter (“No All-Nighter”) compared to those who reported at least one all-nighter (“At Least One All-Nighter”). Students who did not pull an all-nighter have a higher median sleep duration, around 8 hours, while students with at least one all-nighter have a median closer to 7.25 hours. The interquartile range is narrower for the “No All-Nighter” group, indicating less variability in sleep duration, while the “At Least One All-Nighter” group shows a slightly wider spread. There are also a few outliers among students who did not pull an all-nighter, with some sleeping less than 6 hours or more than 9 hours. Overall, this plot visually supports the t-test results, suggesting that students who avoid all-nighters tend to get more sleep on average.

## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.7319
## alternative hypothesis: true difference in means between group Abstain and group Heavy is greater than 0
## 95 percent confidence interval:
##  -5.449477       Inf
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500

##   StressScore AlcoholUse
## 1           8   Moderate
## 2           3   Moderate
## 3           9      Light
## 4           6      Light
## 5          14   Moderate
## 6          28    Abstain

Continuing our discussion with Question 7, this question asks, “Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?” Once data filtering was complete, we see the above t-test results, which indicate no statistically significant difference in stress scores between the two groups. The p-value is 0.7319, which is well above the typical significance threshold of 0.05, suggesting that the observed difference could likely be due to random variation.

The mean stress score for students who abstain from alcohol is 8.97, while for those who report heavy alcohol use, it is slightly higher at 10.44. The 95% confidence interval for the difference in means ranges from -6.26 to 3.33, including zero, further supporting the lack of a significant difference. This analysis suggests that there is no strong association between alcohol abstention and lower stress scores in this sample. Instead of using a box plot to show the visual representation, a denisty plot is created and shown below.

The density plot above displays the distribution of stress scores across four groups of alcohol use: Abstain, Light, Moderate, and Heavy. Students who abstain from alcohol (in pink) tend to have a concentration of lower stress scores, with a peak at lower levels and a distribution that tapers off as scores increase. Heavy alcohol users (in green) also have a peak around lower stress scores but show a wider spread across higher stress levels, suggesting slightly more variation than the Abstain group. The Light (blue) and Moderate (purple) groups display broader distributions, with peaks similar to abstainers but extended tails into higher stress levels, indicating more variability within these groups. Overall, while students who abstain from alcohol show a slightly higher concentration at lower stress levels, the distributions are generally similar across groups. This supports the t-test findings, which indicated no statistically significant difference in stress scores based on alcohol use.

## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216

##   Drinks Gender
## 1     10      0
## 2      6      0
## 3      3      0
## 4      2      0
## 5      4      0
## 6      0      1

Question 8 asked, “Is there a significant difference in the average number of drinks per week between students of different genders?” Once data filtering was conducted, this t-test resulted in a p-value of 7.002e-09, indicating a statistically significant difference between the two groups. The mean number of drinks per week for group 0 (likely males) is 4.24, while for group 1 (likely females), it is 7.54. The 95% confidence interval for the difference in means ranges from -4.36 to -2.24, which does not include zero, further confirming the significance of the difference. These results suggest that, on average, students in group 1 consume more drinks per week than those in group 0, with a clear statistical difference between the genders. A Box plot was generated to visualize the data which is shown below.

The box plot above shows the distribution of the average number of drinks per week for students in two gender groups, labeled as “Group 0” and “Group 1.” Group 1 has a higher median number of drinks per week, around 7.5, compared to Group 0, which has a median closer to 4. The interquartile range for Group 1 is also wider, indicating more variability in weekly drink consumption within this group. Additionally, there are a few outliers in Group 1, with some students consuming over 20 drinks per week. This visual supports the t-test results, which found a statistically significant difference between the two groups, suggesting that, on average, students in Group 1 consume more drinks per week than those in Group 0.

## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543

##   WeekdayBed Stress
## 1      25.75 normal
## 2      25.70 normal
## 3      27.44 normal
## 4      23.50 normal
## 5      25.90 normal
## 6      23.80   high

Question 9 explored the question, “Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress = High vs. Stress = Normal)?” Interpreting the results of the above t-test, we find a p-value of 0.2855, indicating no statistically significant difference in average weekday bedtimes between the two stress groups. The mean weekday bedtime for students with high stress is 24.72 (equivalent to around 12:43 AM in a 24-hour format), while for students with normal stress, it is slightly later at 24.89 (around 12:53 AM). The 95% confidence interval for the difference in means ranges from -0.49 to 0.14, which includes zero, further supporting the lack of significance. These findings suggest that stress level does not appear to significantly impact weekday bedtime in this sample. The box plot for this question was created and is shown below.

The box plot above shows the average weekday bedtime for students with high stress versus those with normal stress levels. Both groups have similar median bedtimes, around 24.7 (approximately 12:43 AM), indicating minimal difference in typical weekday bedtimes between the two stress groups. The interquartile ranges are also close, suggesting similar variability in bedtimes within each group. However, the “Normal” stress group shows a slightly wider spread and a few outliers, with some students going to bed as late as 28 (4:00 AM) or as early as 22 (10:00 PM). Overall, this visualization supports the t-test findings, indicating no significant difference in weekday bedtime based on stress level.

## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    8.213592                    8.221892

##   WeekendSleep ClassYear
## 1         5.88         4
## 2         7.25         4
## 3        10.09         4
## 4         7.25         1
## 5         7.00         4
## 6         9.00         4

The last and final question was, “Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?” According to the above t-test, our data concludes that there is no statistically significant difference in weekend sleep hours between first- and second-year students and those in later years. The p-value is 0.9618, far above the typical significance threshold of 0.05, indicating that any observed difference could be due to random variation.

The mean weekend sleep duration for first two-year students is 8.21 hours, while for students in later years, it is 8.22 hours. The 95% confidence interval for the difference in means ranges from -0.35 to 0.33, which includes zero, further confirming the lack of significance. This analysis suggests that year group does not have a notable impact on the amount of sleep students get on weekends. A box plot for this question was generated and is shown below.

The box plot above illustrates the distribution of weekend sleep hours for first two-year students compared to those in later years. Both groups have similar median sleep durations, around 8 hours, indicating minimal difference in average weekend sleep. The interquartile ranges are also comparable, suggesting similar variability in sleep duration across the two groups. There are a few outliers in both groups, with some students sleeping significantly more or less than the average on weekends. Overall, this visualization supports the t-test results, indicating no statistically significant difference in weekend sleep duration between the two year groups.

(4) Conclusion

Based on the analyses conducted, this study explored various aspects of college students’ lifestyles, including sleep patterns, academic performance, mental health, and habits, examining how these factors may vary across different demographics and behaviors.

Key findings include a significant difference in GPA between male and female students, with males showing slightly higher averages, and a significant difference in alcohol consumption between genders, with females reporting a higher average number of drinks per week. Additionally, students who did not report any all-nighters tended to get more sleep on average than those who did, suggesting that avoiding all-nighters may be associated with healthier sleep patterns. Mental health factors, such as depression, also showed a strong relationship with happiness levels, with students experiencing moderate or higher depression reporting lower happiness on average. However, other factors, such as stress level and alcohol abstinence, did not significantly influence stress scores or bedtime, nor did year group affect weekend sleep duration.

These results provide insights into the complex interactions between lifestyle factors and well-being among college students. While some variables, like depression and alcohol consumption, showed notable impacts on happiness and sleep, other aspects, such as stress and academic year, appeared less influential on students’ sleep habits and stress levels. This analysis highlights the importance of targeted mental health resources and education on healthy habits, as certain behaviors, like all-nighters and heavy alcohol use, may contribute to negative outcomes. Overall, this study contributes to a broader understanding of college life, emphasizing areas for potential support to improve student health and academic success.

(5) Appendix

Load necessary packages

library(readxl) library(ggplot2)

Reading the dataset

SleepStudy <- read_excel(“SleepStudy.xlsx”) View(SleepStudy)

Question 1: Difference in GPA between male and female students

SleepStudy\(Gender <- as.factor(SleepStudy\)Gender) t_test_result <- t.test(GPA ~ Gender, data = SleepStudy) print(t_test_result)

Box plot for GPA by Gender

ggplot(SleepStudy, aes(x = Gender, y = GPA, fill = Gender)) + geom_boxplot() + labs(title = “GPA by Gender”, x = “Gender”, y = “GPA”) + scale_x_discrete(labels = c(“Male”, “Female”)) + theme_minimal()

Question 2: Difference in early classes between first two years and other years

SleepStudy\(YearGroup <- as.factor(ifelse(SleepStudy\)ClassYear %in% c(1, 2), “FirstTwoYears”, “OtherYears”)) t_test_result <- t.test(NumEarlyClass ~ YearGroup, data = SleepStudy) print(t_test_result)

Box plot for Early Classes by Year Group

ggplot(SleepStudy, aes(x = YearGroup, y = NumEarlyClass, fill = YearGroup)) + geom_boxplot() + labs(title = “Number of Early Classes by Year Group”, x = “Year Group”, y = “Number of Early Classes”) + theme_minimal()

Question 3: Difference in cognitive skills between “larks” and “owls”

lark_owl_data <- subset(SleepStudy, LarkOwl %in% c(“Lark”, “Owl”)) lark_owl_data\(LarkOwl <- as.factor(lark_owl_data\)LarkOwl) t_test_result <- t.test(CognitionZscore ~ LarkOwl, data = lark_owl_data) print(t_test_result)

Overlayed histogram for Cognitive Skills by Sleep Type

ggplot(SleepStudy, aes(x = CognitionZscore, fill = LarkOwl)) + geom_histogram(bins = 15, alpha = 0.5, position = “identity”) + labs(title = “Distribution of Cognitive Skills by Sleep Type”, x = “Cognition Z-score”, y = “Count”) + theme_minimal()

Question 4: Difference in classes missed between students with and without early classes

SleepStudy\(EarlyClass <- as.factor(SleepStudy\)EarlyClass) t_test_result <- t.test(ClassesMissed ~ EarlyClass, data = SleepStudy) print(t_test_result)

Box plot for Classes Missed by Early Class Status

ggplot(SleepStudy, aes(x = EarlyClass, y = ClassesMissed, fill = EarlyClass)) + geom_boxplot() + labs(title = “Classes Missed by Early Class Status”, x = “Early Class”, y = “Classes Missed”) + theme_minimal()

Question 5: Difference in happiness levels based on depression status

SleepStudy\(DepressionStatus <- as.factor(ifelse(SleepStudy\)DepressionScore >= 10, “ModerateOrHigher”, “Normal”)) t_test_result <- t.test(Happiness ~ DepressionStatus, data = SleepStudy) print(t_test_result)

Box plot for Happiness by Depression Status

ggplot(SleepStudy, aes(x = DepressionStatus, y = Happiness, fill = DepressionStatus)) + geom_boxplot() + labs(title = “Happiness Levels by Depression Status”, x = “Depression Status”, y = “Happiness Score”) + theme_minimal()

Question 6: Difference in average sleep based on all-nighter status

SleepStudy\(AllNighter <- as.factor(SleepStudy\)AllNighter) t_test_result <- t.test(AverageSleep ~ AllNighter, data = SleepStudy) print(t_test_result)

Box plot for Average Sleep by All-Nighter Status

ggplot(SleepStudy, aes(x = AllNighter, y = AverageSleep, fill = AllNighter)) + geom_boxplot() + labs(title = “Average Sleep by All-Nighter Status”, x = “All-Nighter”, y = “Average Sleep (hours)”) + theme_minimal()

Question 7: Difference in stress scores based on alcohol use (Abstain vs. Heavy)

filtered_data <- subset(SleepStudy, AlcoholUse %in% c(“Abstain”, “Heavy”)) filtered_data\(AlcoholUse <- as.factor(filtered_data\)AlcoholUse) t_test_result <- t.test(StressScore ~ AlcoholUse, data = filtered_data) print(t_test_result)

Density plot of Stress Scores by Alcohol Use

ggplot(SleepStudy, aes(x = StressScore, fill = AlcoholUse)) + geom_density(alpha = 0.5) + labs(title = “Density Plot of Stress Scores by Alcohol Use”, x = “Stress Score”, fill = “Alcohol Use”) + theme_minimal()

Question 8: Difference in drinks per week by gender

SleepStudy\(Gender <- as.factor(SleepStudy\)Gender) t_test_result <- t.test(Drinks ~ Gender, data = SleepStudy) print(t_test_result)

Box plot for Drinks by Gender

ggplot(SleepStudy, aes(x = Gender, y = Drinks, fill = Gender)) + geom_boxplot() + labs(title = “Average Number of Drinks per Week by Gender”, x = “Gender”, y = “Drinks per Week”) + theme_minimal()

Question 9: Difference in weekday bedtime based on stress level

filtered_data <- subset(SleepStudy, Stress %in% c(“high”, “normal”)) filtered_data\(Stress <- as.factor(filtered_data\)Stress) t_test_result <- t.test(WeekdayBed ~ Stress, data = filtered_data) print(t_test_result)

Box plot for Weekday Bedtime by Stress Level

ggplot(SleepStudy, aes(x = Stress, y = WeekdayBed, fill = Stress)) + geom_boxplot() + labs(title = “Weekday Bedtime by Stress Level”, x = “Stress Level”, y = “Weekday Bedtime”) + theme_minimal()

Question 10: Difference in weekend sleep hours between year groups

SleepStudy\(YearGroup <- as.factor(ifelse(SleepStudy\)ClassYear %in% c(1, 2), “FirstTwoYears”, “OtherYears”)) t_test_result <- t.test(WeekendSleep ~ YearGroup, data = SleepStudy) print(t_test_result)

Box plot for Weekend Sleep Hours by Year Group

ggplot(SleepStudy, aes(x = YearGroup, y = WeekendSleep, fill = YearGroup)) + geom_boxplot() + labs(title = “Weekend Sleep Hours by Year Group”, x = “Year Group”, y = “Weekend Sleep (hours)”) + theme_minimal()

Project #2, College Students Sleep Patterns

KL

2024-11-25