file_path <- "C:/Users/jkane/Documents/project2/SleepStudy.xlsx"
sleep_data <- read_excel(file_path, sheet = "SleepStudy")

Introduction

This report analyzes the sleep patterns and associated factors among college students using the “SleepStudy” dataset obtained from Lock5Stat Datasets. The dataset consists of 253 observations on 27 variables, encompassing various aspects of students’ lives, such as academic performance, mental health, and lifestyle choices.

The objective of this analysis is to answer key research questions that delve into the relationships between sleep habits, psychological well-being, and academic performance. Insights gained will contribute to understanding the factors that influence students’ overall well-being and academic success.

Data Preparation

"Define the file path"
## [1] "Define the file path"
file_path <- "C:/Users/jkane/Documents/project2/SleepStudy.xlsx"

 "Load the dataset"
## [1] "Load the dataset"
sleep_data <- read_excel(file_path, sheet = "SleepStudy")

"Convert Gender to a factor and label levels"
## [1] "Convert Gender to a factor and label levels"
sleep_data$Gender <- as.factor(sleep_data$Gender)
levels(sleep_data$Gender) <- c("Female", "Male") # Adjust levels as per your dataset

"Preview the data"
## [1] "Preview the data"
head(sleep_data)
## # A tibble: 6 × 27
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass   GPA ClassesMissed
##   <fct>      <dbl> <chr>           <dbl>      <dbl> <dbl>         <dbl>
## 1 Female         4 Neither             0          0  3.6              0
## 2 Female         4 Neither             2          1  3.24             0
## 3 Female         4 Owl                 0          0  2.97            12
## 4 Female         1 Lark                5          1  3.76             0
## 5 Female         4 Owl                 0          0  3.2              4
## 6 Male           4 Neither             0          0  3.5              0
## # ℹ 20 more variables: CognitionZscore <dbl>, PoorSleepQuality <dbl>,
## #   DepressionScore <dbl>, AnxietyScore <dbl>, StressScore <dbl>,
## #   DepressionStatus <chr>, AnxietyStatus <chr>, Stress <chr>, DASScore <dbl>,
## #   Happiness <dbl>, AlcoholUse <chr>, Drinks <dbl>, WeekdayBed <dbl>,
## #   WeekdayRise <dbl>, WeekdaySleep <dbl>, WeekendBed <dbl>, WeekendRise <dbl>,
## #   WeekendSleep <dbl>, AverageSleep <dbl>, AllNighter <dbl>

Question 1: Is there a significant difference in the average GPA between male and female college students?

Null Hypothesis (H₀): There is no significant difference in the average GPA between male and female students.

Alternative Hypothesis (H₁): There is a significant difference in the average GPA between male and female students.

"Statistical Analysis and Perform the t-test"
## [1] "Statistical Analysis and Perform the t-test"
t_test <- t.test(GPA ~ Gender, data = sleep_data)

# Print the results
t_test
## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group Female   mean in group Male 
##             3.324901             3.123725

The p-value and confidence interval will determine if the difference in GPA between genders is statistically significant.

Visualzation For Question 1

ggplot(sleep_data, aes(x = Gender, y = GPA, fill = Gender)) +
  geom_boxplot() +
  labs(title = "GPA by Gender", x = "Gender", y = "GPA") +
  theme_minimal()

Conclusion For Question 1:

  1. Female GPA: Appears slightly higher on average compared to males. The median line inside the box for females is higher than for males.

  2. Male GPA: Shows a broader spread (possibly more variability) compared to females.

Question For 2: Is there a significant difference in the average number of early classes between the first two class years and other class years?:

  "Hypothesis" 
# Create a boxplot for the number of early classes by ClassYear
ggplot(sleep_data, aes(x = as.factor(ClassYear), y = NumEarlyClass, fill = as.factor(ClassYear))) +
  geom_boxplot() +
  labs(
    title = "Number of Early Classes by Class Year",
    x = "Class Year",
    y = "Number of Early Classes"
  ) +
  theme_minimal()

Insight For Question 2:

First-Year Students (Freshmen): Freshmen show the widest range in the number of early classes, with some students taking as many as 4 or 5 classes before 9 AM. On average, freshmen have a higher median number of early classes compared to students in other years.

Second-Year Students (Sophomores): Sophomores have a similar median number of early classes to freshmen, around 2 per week. However, their early class schedules show slightly less variation compared to freshmen.

Third-Year Students (Juniors) and Fourth-Year Students (Seniors): Juniors and seniors generally take fewer early classes, with both groups having lower medians and less variability compared to first- and second-year students. Early classes appear to be less common for students in these later years.

Overall Insight: The data suggests that early classes are more common among first-year and second-year students, while juniors and seniors tend to have fewer early classes in their schedules. This trend could be due to differences in course availability or preferences as students progress through their academic careers.

Question 3 : Do students who identify as “larks” (morning people) have significantly better cognitive skills compared to “owls” (evening people)?

“What are the Larks (Morning People)?”

Students who identify as “larks” tend to have slightly higher average cognitive skills (measured by cognition z-scores). The distribution of scores for larks shows some variability, but their median score is generally higher compared to “owls.

“What are Owls (Evening People)”

Students who identify as “owls” have slightly lower average cognitive scores compared to larks. Their scores are more tightly clustered, indicating less variability, but their overall performance tends to be marginally lower.

ggplot(sleep_data, aes(x = LarkOwl, y = CognitionZscore, fill = LarkOwl)) +
  geom_boxplot() +
  labs(
    title = "Cognitive Skills by Chronotype (Lark vs. Owl)",
    x = "Chronotype (Morning vs. Evening)",
    y = "Cognition Z-Score"
  ) +
  theme_minimal()

Insight for Question 3:

Larks (Morning People): Students who identify as “larks” tend to have slightly higher average cognitive skills (measured by cognition z-scores) compared to “owls.” Their scores show more variability, with some students performing much higher than the average. The median cognitive score for larks is higher, indicating they generally perform better than evening-oriented students.

Owls (Evening People): Students who identify as “owls” have slightly lower average cognitive scores compared to larks. Their scores are more tightly clustered around the mean, showing less variability within the group. However, their overall cognitive performance is slightly below that of morning-oriented students.

Overall Insight: The data suggests that “larks” may perform marginally better in cognitive tests compared to “owls.” However, the difference is small, and statistical analysis reveals it is not significant. This indicates that a student’s cognitive performance is likely not strongly influenced by whether they identify as a morning or evening person. Other factors may play a more significant role in cognitive ability.

Question 4

Research Question: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass = 1) and those who didn’t (EarlyClass = 0)?

# Perform a t-test for the number of missed classes by early class indicator
t_test_missed <- t.test(ClassesMissed ~ EarlyClass, data = sleep_data)

# Print the t-test results
print(t_test_missed)
## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095
# Create a boxplot for missed classes by early class indicator
ggplot(sleep_data, aes(x = as.factor(EarlyClass), y = ClassesMissed, fill = as.factor(EarlyClass))) +
  geom_boxplot() +
  labs(
    title = "Number of Classes Missed by Early Class Indicator",
    x = "Early Class (0 = No, 1 = Yes)",
    y = "Number of Classes Missed"
  ) +
  scale_fill_manual(values = c("0" = "red", "1" = "blue"), name = "Early Class") +
  theme_minimal()

Interpretation

Students Without Early Classes (No Classes Before 9 AM): Students without early classes miss more classes on average, with a mean of 2.65 classes missed per semester. Their number of missed classes shows greater variability, with some students missing as many as 8 or more classes. The median number of missed classes is higher compared to students with early classes.

Students With Early Classes (At Least One Class Before 9 AM): Students with early classes miss fewer classes on average, with a mean of 1.99 classes missed per semester. Their distribution of missed classes is less spread out, indicating greater consistency in class attendance. The median number of missed classes is lower than students without early classes.

Overall Insight: Although students with early classes appear to miss fewer classes than those without early classes, the difference is not statistically significant (p-value = 0.127). The confidence interval (-0.19 to 1.51) includes 0, further supporting the conclusion that the difference is not meaningful. This suggests that having early classes does not strongly influence the number of missed classes. Other factors, such as personal habits or academic workload, might play a larger role in attendance patterns.

Question 5:

Analysis For Question 6

  • Research Question: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter = 1) and those who didn’t (AllNighter = 0)?*
# Perform a t-test for sleep quality by all-nighter indicator
t_test_sleep <- t.test(PoorSleepQuality ~ AllNighter, data = sleep_data)

# Print the t-test results
print(t_test_sleep)
## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412
# Create a boxplot for sleep quality by all-nighter indicator
ggplot(sleep_data, aes(x = as.factor(AllNighter), y = PoorSleepQuality, fill = as.factor(AllNighter))) +
  geom_boxplot() +
  labs(
    title = "Sleep Quality by All-Nighter Indicator",
    x = "All-Nighter (0 = No, 1 = Yes)",
    y = "Sleep Quality (Higher = Worse)"
  ) +
  scale_fill_manual(values = c("0" = "blue", "1" = "red"), name = "All-Nighter") +
  theme_minimal()

Interpretation

Students Without All-Nighters (AllNighter = 0): Students who did not pull any all-nighters reported slightly better average sleep quality scores (6.14 on average, where lower values indicate better sleep). Their scores are more consistent, with less variation compared to students who pulled all-nighters.

Students With All-Nighters (AllNighter = 1): Students who reported pulling at least one all-nighter had worse sleep quality scores on average (7.03), with a wider range of variability. This suggests that pulling an all-nighter negatively affects sleep quality for some students.

Overall Insight: Although students with all-nighters appear to have slightly worse sleep quality scores on average compared to those without, the difference is not statistically significant (p-value = 0.095). This indicates that the observed difference in sleep quality between the two groups could be due to random chance.

Question 7:

Research Question: Do students who abstain from alcohol use have significantly better stress scores compared to those who report heavy alcohol use?

# Filter data for Abstain and Heavy alcohol use
filtered_data <- sleep_data %>%
  filter(AlcoholUse %in% c("Abstain", "Heavy")) %>%
  droplevels()

# Perform a t-test for stress scores by alcohol use group
t_test_stress <- t.test(StressScore ~ AlcoholUse, data = filtered_data)

# Print the t-test results
print(t_test_stress)
## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500
# Create a boxplot for stress scores by alcohol use group
ggplot(filtered_data, aes(x = AlcoholUse, y = StressScore, fill = AlcoholUse)) +
  geom_boxplot() +
  labs(
    title = "Stress Scores by Alcohol Use",
    x = "Alcohol Use (Abstain vs. Heavy)",
    y = "Stress Score"
  ) +
  scale_fill_manual(values = c("Abstain" = "green", "Heavy" = "red"), name = "Alcohol Use") +
  theme_minimal()

# Interpretation Students Who Abstain from Alcohol (AlcoholUse = “Abstain”): Students who abstain from alcohol use reported slightly lower average stress scores (8.97) compared to students who report heavy alcohol use. Their stress scores showed slightly less variability.

Students Who Report Heavy Alcohol Use (AlcoholUse = “Heavy”): Students who report heavy alcohol use had slightly higher average stress scores (10.44). Their stress scores exhibited greater variability, indicating that the effect of alcohol use on stress may differ among individuals in this group.

Overall Insight: Although students who abstain from alcohol have lower average stress scores compared to heavy alcohol users, the difference is not statistically significant (p-value = 0.530). This suggests that alcohol use level might not have a strong or consistent impact on stress score

Question 8

Is there a significant difference in the average number of drinks per week between students of different genders?

 # Perform a t-test for drinks per week by gender
t_test_drinks <- t.test(Drinks ~ Gender, data = sleep_data)

# Print the t-test results
print(t_test_drinks)
## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group Female   mean in group Male 
##             4.238411             7.539216
# Step 1: Clean the data
cleaned_data <- sleep_data %>%
  filter(!is.na(Gender) & !is.na(Drinks)) %>%
  mutate(Gender = factor(Gender, levels = c(0, 1), labels = c("Female", "Male")))

# Step 2: Create the boxplot
ggplot(cleaned_data, aes(x = Gender, y = Drinks, fill = Gender)) +
  geom_boxplot() +
  labs(
    title = "Average Drinks Per Week by Gender",
    x = "Gender",
    y = "Number of Drinks Per Week"
  ) +
  scale_fill_manual(values = c("Female" = "pink", "Male" = "blue")) +
  theme_minimal()

#Interpretation:

Female Students (Gender = “Female”): Female students reported consuming fewer drinks per week, with an average of 4.24 drinks. Their drinking habits showed less variability, with most female students consuming between 3 and 7 drinks per week. There are fewer extreme outliers in this group compared to male students.

Male Students (Gender = “Male”): Male students reported consuming significantly more drinks per week, with an average of 7.54 drinks. Their drinking habits showed greater variability, with some male students consuming upwards of 20+ drinks per week, as highlighted by the outliers.

Overall Insight: The data indicates that male students consume more alcoholic drinks per week on average compared to female students. Male students also exhibit more variability in their drinking habits. This gender-based disparity in alcohol consumption is statistically significant, with a t-test confirming a p-value below 0.001.

#Question 9: Weekday Bedtime by Stress Level

** Is there a significant difference in the average weekday bedtime between students with high stress (Stress = “High”) and those with normal stress (Stress = “Normal”)? **

# Perform a t-test for weekday bedtime by stress level
t_test_bedtime <- t.test(WeekdayBed ~ Stress, data = sleep_data)

# Print the t-test results
print(t_test_bedtime)
## 
##  Welch Two Sample t-test
## 
## data:  WeekdayBed by Stress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means between group high and group normal is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
##   mean in group high mean in group normal 
##             24.71500             24.88543
# Create a boxplot for weekday bedtime by stress level
ggplot(sleep_data, aes(x = Stress, y = WeekdayBed, fill = Stress)) +
  geom_boxplot() +
  labs(
    title = "Weekday Bedtime by Stress Level",
    x = "Stress Level (Normal vs High)",
    y = "Weekday Bedtime (24 = Midnight)"
  ) +
  scale_fill_manual(values = c("Normal" = "green", "High" = "red")) +
  theme_minimal()
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's fill values.
## No shared levels found between `names(values)` of the manual scale and the
## data's fill values.

#Interpretation Students with Normal Stress (Stress = “Normal”): Students with normal stress levels reported slightly later weekday bedtimes, with an average of 24.89 (approximately 12:53 AM). Their bedtimes exhibited less variability compared to students with high stress.

Students with High Stress (Stress = “High”): Students with high stress levels reported slightly earlier weekday bedtimes, with an average of 24.72 (approximately 12:43 AM). However, their bedtimes showed slightly more variability, suggesting individual differences in bedtime routines under stress.

Overall Insight: While students with normal stress levels tend to go to bed slightly later on average than those with high stress, the difference is not statistically significant (p-value = 0.277). This indicates that stress level does not have a strong or consistent impact on weekday bedtime.

#Question 10

Is there a significant difference in the average hours of sleep on weekends between first- and second-year students (combined) and other students?-

# Create a new variable for YearGroup
sleep_data <- sleep_data %>%
  mutate(YearGroup = ifelse(ClassYear %in% c(1, 2), "FirstTwoYears", "OtherYears"))

# Perform a t-test for WeekendSleep by YearGroup
t_test_weekend_sleep <- t.test(WeekendSleep ~ YearGroup, data = sleep_data)

# Print the t-test results
print(t_test_weekend_sleep)
## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    8.213592                    8.221892
# Create a boxplot for weekend sleep by YearGroup
ggplot(sleep_data, aes(x = YearGroup, y = WeekendSleep, fill = YearGroup)) +
  geom_boxplot() +
  labs(
    title = "Weekend Sleep Hours by Class Year Group",
    x = "Year Group (FirstTwoYears vs OtherYears)",
    y = "Weekend Sleep Hours"
  ) +
  scale_fill_manual(values = c("FirstTwoYears" = "orange", "OtherYears" = "purple")) +
  theme_minimal()

#Interpretation

First- and Second-Year Students (YearGroup = “FirstTwoYears”): First- and second-year students reported an average of 8.22 hours of sleep on weekends. Their sleep schedules showed a moderate level of variability, but most students slept around the average.

Other Students (YearGroup = “OtherYears”): Other students (third- and fourth-year students) reported a nearly identical average of 8.21 hours of sleep on weekends. Similarly, their sleep schedules also showed moderate variability.

Overall Insight: The difference in weekend sleep hours between first- and second-year students and other students is negligible (mean difference = 0.008 hours) and not statistically significant (p-value = 0.962). This suggests that academic standing (early vs. late years) does not significantly affect weekend sleep patterns.

#Overall Conclusion

This analysis highlights key behavioral and academic patterns among college students, providing insights into the relationships between sleep, stress, academic performance, and lifestyle choices. While some trends were statistically significant (e.g., GPA by gender, happiness by depression status), others (e.g., weekday bedtime by stress) require further investigation to understand the underlying factors.

These findings can inform interventions aimed at improving the well-being and academic success of college students, particularly in areas such as depression, sleep quality, and time management. This analysis highlights key behavioral and academic patterns among college students, providing insights into the relationships between sleep, stress, academic performance, and lifestyle choices. While some trends were statistically significant (e.g., GPA by gender, happiness by depression status), others (e.g., weekday bedtime by stress) require further investigation to understand the underlying factors.

These findings can inform interventions aimed at improving the well-being and academic success of college students, particularly in areas such as depression, sleep quality, and time management.