Introduction

This report presents an analysis of sleep patterns among college students, using the “SleepStudy” dataset retrieved from https://www.lock5stat.com/datasets3e/SleepStudy.csv. The dataset comprises 253 observations on 27 variables, containing evaluations based around sleep and attributes affecting it, student mental states, and social behaviors of college students.

The primary objective of this analysis is to answer a serious of statistic based research questions derived from the dataset. The questions were gotten via the D2L project 2 example. The questions presented by this report are used to examine students sleep and attributes affecting it, student mental states, and social behaviors of college students. By solving these problems we can gain experience with using datasets in research projects.

Questions

  1. Is there a significant difference in the average GPA between male and female college students?

  2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

  3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

  4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

  5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

  6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

  7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

  8. Is there a significant difference in the average number of drinks per week between students of different genders?

  9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

  10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

DataSet

Sleep = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(Sleep)
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Analysis

Q1. Is there a significant difference in the average GPA between male and female college students?

#im just going to assume male is 0
avg_gpa_male <- Sleep$GPA[Sleep$Gender == 0]
avg_gpa_female <- Sleep$GPA[Sleep$Gender == 1]

t.test(avg_gpa_male, avg_gpa_female)
## 
##  Welch Two Sample t-test
## 
## data:  avg_gpa_male and avg_gpa_female
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean of x mean of y 
##  3.324901  3.123725

Assuming that gender 0 is male, There is not a significant difference between male and female average gpa.

Summary: I set two gpa variables using genders 1 and 0. Im assuming that 0 is male for this question. Than I plugged both variables into a t-test and got a p-value = 0.0001243. Which is less than .05 so it’s not significant.

Q2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

# There is no years greater than 4 so I am safe to do this
early_class_2year <- Sleep$NumEarlyClass[Sleep$ClassYear == 1 + 2]
early_class_morethan2year <- Sleep$NumEarlyClass[Sleep$ClassYear > 2]


t.test(early_class_2year, early_class_morethan2year)
## 
##  Welch Two Sample t-test
## 
## data:  early_class_2year and early_class_morethan2year
## t = 0.54589, df = 97.86, p-value = 0.5864
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3152376  0.5544768
## sample estimates:
## mean of x mean of y 
##  1.425926  1.306306

Yes, students in the first two years have significantly more average number of classes than other class years

Summary: I set two numEarly variables classyear == 1 + 2 and classyear > 2. I did 1 + 2 instead of 2< to avoid 0 entries. Than I plugged both variables into a t-test and got a p-value = 0.5864, Which is greater than .05 so it’s significant.

Q3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

larks_zcog <- Sleep$CognitionZscore[Sleep$LarkOwl == "Lark"]
owls_zcog <- Sleep$CognitionZscore[Sleep$LarkOwl == "Owl"]

t.test(larks_zcog, owls_zcog)
## 
##  Welch Two Sample t-test
## 
## data:  larks_zcog and owls_zcog
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
##   mean of x   mean of y 
##  0.09024390 -0.03836735

Yes, larks have a significantly better cognition z-score than owls.

Summary: I set two CognitionZscore variables LarkOwl == “Lark” and LarkOwl == “Owl”. Than I plugged both variables into a t-test and got a p-value = 0.4229, Which is greater than .05 so it’s significant.

Q4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

missed_early <- Sleep$ClassesMissed[Sleep$EarlyClass == 1]
missed_notearly <- Sleep$ClassesMissed[Sleep$EarlyClass == 0]

t.test(missed_early, missed_notearly)
## 
##  Welch Two Sample t-test
## 
## data:  missed_early and missed_notearly
## t = -1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.5412830  0.2233558
## sample estimates:
## mean of x mean of y 
##  1.988095  2.647059

Yes, Students who did not have early classes missed significantly more classes on average.

Summary: I set two classesmissed variables earlyclass == 1 and earlyclass == 0. Than I plugged both variables into a t-test and got a p-value = 0.1421, Which is greater than .05 so it’s significant.

Q5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

happy_mod_dep <- Sleep$Happiness[Sleep$DepressionStatus == "moderate"]
happy_norm_dep <- Sleep$Happiness[Sleep$DepressionStatus == "normal"]

t.test(happy_mod_dep, happy_norm_dep)
## 
##  Welch Two Sample t-test
## 
## data:  happy_mod_dep and happy_norm_dep
## t = -4.3253, df = 43.992, p-value = 8.616e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.818614 -2.119748
## sample estimates:
## mean of x mean of y 
##  23.08824  27.05742

No, there is not a significant difference in average happiness between normal and moderate depression students.

Summary: I set two Happiness variables DepressionStatus == “moderate” and DepressionStatus == “normal”. Than I plugged both variables into a t-test and got a p-value = 0.00008616, Which is less than .05 so it’s not significant.

Q6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

sleep_quality_all <- Sleep$AverageSleep[Sleep$AllNighter >= 1]
sleep_quality_none <- Sleep$AverageSleep[Sleep$AllNighter == 0]

t.test(sleep_quality_all, sleep_quality_none)
## 
##  Welch Two Sample t-test
## 
## data:  sleep_quality_all and sleep_quality_none
## t = -4.4256, df = 42.171, p-value = 6.666e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.1685667 -0.4366603
## sample estimates:
## mean of x mean of y 
##  7.271176  8.073790

No, there is no significant difference in average sleep score between students that do at least 1 all nighter and none.

Summary: I set two AverageSleep variables allnighter >= 1 and allnighter == 0. Than I plugged both variables into a t-test and got a p-value = 0.00006666, Which is less than .05 so it’s not significant.

Q7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

stress_alcohol <- Sleep$StressScore[Sleep$AlcoholUse == "Heavy"]
stress_no_alcohol <- Sleep$StressScore[Sleep$AlcoholUse == "Abstain"]

t.test(stress_no_alcohol, stress_alcohol)
## 
##  Welch Two Sample t-test
## 
## data:  stress_no_alcohol and stress_alcohol
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean of x mean of y 
##  8.970588 10.437500

Yes, students who abstain from alcohol have a significantly lower stress score than heavy drinkers

Summary: I set two Stressscore variables AlcoholUse == “Heavy” and AlcoholUse == “Abstain”. Than I plugged both variables into a t-test and got a p-value = 0.5362, Which is greater than .05 so it’s significant.

Q8. Is there a significant difference in the average number of drinks per week between students of different genders?

male_drinks <- Sleep$Drinks[Sleep$Gender == 0]
female_drinks <- Sleep$Drinks[Sleep$Gender == 1]

t.test(male_drinks, female_drinks)
## 
##  Welch Two Sample t-test
## 
## data:  male_drinks and female_drinks
## t = -6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean of x mean of y 
##  4.238411  7.539216

No, there is not a significant difference in the number of drinks per week based on gender.

Summary: I set two Drinks variables Gender == 0 and Gender == 1. Than I plugged both variables into a t-test and got a p-value = 0.000000007, Which is less than .05 so it’s not significant.

Q9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

weekday_bed_high <- Sleep$WeekdayBed[Sleep$Stress == "high"]
weekday_bed_normal <- Sleep$WeekdayBed[Sleep$Stress == "normal"]

t.test(weekday_bed_high, weekday_bed_normal)
## 
##  Welch Two Sample t-test
## 
## data:  weekday_bed_high and weekday_bed_normal
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4856597  0.1447968
## sample estimates:
## mean of x mean of y 
##  24.71500  24.88543

Yes, weekday bedtime for normal stress level students is significantly higher than high stress students.

Summary: I set two WeekdayBed variables Stress == “high” and Stress == “normal”. Than I plugged both variables into a t-test and got a p-value = 0.2855, Which is greater than .05 so it’s significant.

Q10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

sleep_av_first_two <- Sleep$WeekendSleep[Sleep$ClassYear == 1 + 2]
sleep_av_other <- Sleep$WeekendSleep[Sleep$ClassYear != 1 + 2]

t.test(sleep_av_first_two, sleep_av_other)
## 
##  Welch Two Sample t-test
## 
## data:  sleep_av_first_two and sleep_av_other
## t = -0.3305, df = 79.128, p-value = 0.7419
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5109733  0.3654457
## sample estimates:
## mean of x mean of y 
##  8.160000  8.232764

Yes, Students after the first two years get significantly more sleep on weekend than first two year students.

Summary: I set two WeekendSleep variables ClassYear == 1 + 2 and ClassYear != 1 + 2. This covers first two year students and everyone who is not. Than I plugged both variables into a t-test and got a p-value = 0.7419, Which is greater than .05 so it’s significant.

Summary

1. Is there a significant difference in the average GPA between male and female college students?

Assuming that gender 0 is male, There is not a significant difference between male and female average gpa.

Summary: I set two gpa variables using genders 1 and 0. Im assuming that 0 is male for this question. Than I plugged both variables into a t-test and got a p-value = 0.0001243. Which is less than .05 so it’s not significant.

2. Is there a significant difference in the average number of early classes between the first two class years and other class years?

Yes, students in the first two years have significantly more average number of classes than other class years

Summary: I set two numEarly variables classyear == 1 + 2 and classyear > 2. I did 1 + 2 instead of 2< to avoid 0 entries. Than I plugged both variables into a t-test and got a p-value = 0.5864, Which is greater than .05 so it’s significant.

3. Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

Yes, larks have a significantly better cognition z-score than owls.

Summary: I set two CognitionZscore variables LarkOwl == “Lark” and LarkOwl == “Owl”. Than I plugged both variables into a t-test and got a p-value = 0.4229, Which is greater than .05 so it’s significant.

4. Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

Yes, Students who did not have early classes missed significantly more classes on average.

Summary: I set two classesmissed variables earlyclass == 1 and earlyclass == 0. Than I plugged both variables into a t-test and got a p-value = 0.1421, Which is greater than .05 so it’s significant.

5. Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

No, there is not a significant difference in average happiness between normal and moderate depression students.

Summary: I set two Happiness variables DepressionStatus == “moderate” and DepressionStatus == “normal”. Than I plugged both variables into a t-test and got a p-value = 0.00008616, Which is less than .05 so it’s not significant.

6. Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

No, there is no significant difference in average sleep score between students that do at least 1 all nighter and none.

Summary: I set two AverageSleep variables allnighter >= 1 and allnighter == 0. Than I plugged both variables into a t-test and got a p-value = 0.00006666, Which is less than .05 so it’s not significant.

7. Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

Yes, students who abstain from alcohol have a significantly lower stress score than heavy drinkers

Summary: I set two Stressscore variables AlcoholUse == “Heavy” and AlcoholUse == “Abstain”. Than I plugged both variables into a t-test and got a p-value = 0.5362, Which is greater than .05 so it’s significant.

8. Is there a significant difference in the average number of drinks per week between students of different genders?

No, there is not a significant difference in the number of drinks per week based on gender.

Summary: I set two Drinks variables Gender == 0 and Gender == 1. Than I plugged both variables into a t-test and got a p-value = 0.000000007, Which is less than .05 so it’s not significant.

9. Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?

Yes, weekday bedtime for normal stress level students is significantly higher than high stress students.

Summary: I set two WeekdayBed variables Stress == “high” and Stress == “normal”. Than I plugged both variables into a t-test and got a p-value = 0.2855, Which is greater than .05 so it’s significant.

10. Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

Yes, Students after the first two years get significantly more sleep on weekend than first two year students.

Summary: I set two WeekendSleep variables ClassYear == 1 + 2 and ClassYear != 1 + 2. This covers first two year students and everyone who is not. Than I plugged both variables into a t-test and got a p-value = 0.7419, Which is greater than .05 so it’s significant.

Conclusion

Based on all of the data gathered from questions answered some general conclusions can be made. Gender is not a significant factor in GPA and weekly drink count. First two-year students have significantly higher number of early classes and sleep less on weekends.

By doing this project we have displayed that using statistic P-Values we can evaluate significance of certain variables against other variables. It is important to note here that smaller data set variables may have a worse P-value because of this. We also found that we can use statistics to better understand questions pertaining to collected data.

Appendix

# Q1 
avg_gpa_male <- Sleep$GPA[Sleep$Gender == 0]
avg_gpa_female <- Sleep$GPA[Sleep$Gender == 1]
t.test(avg_gpa_male, avg_gpa_female)
# Q2 
early_class_2year <- Sleep$NumEarlyClass[Sleep$ClassYear == 1 + 2]
early_class_morethan2year <- Sleep$NumEarlyClass[Sleep$ClassYear > 2]
t.test(early_class_2year, early_class_morethan2year)
# Q3 
larks_zcog <- Sleep$CognitionZscore[Sleep$LarkOwl == "Lark"]
owls_zcog <- Sleep$CognitionZscore[Sleep$LarkOwl == "Owl"]
t.test(larks_zcog, owls_zcog)
# Q4 
missed_early <- Sleep$ClassesMissed[Sleep$EarlyClass == 1]
missed_notearly <- Sleep$ClassesMissed[Sleep$EarlyClass == 0]
t.test(missed_early, missed_notearly)
# Q5 
happy_mod_dep <- Sleep$Happiness[Sleep$DepressionStatus == "moderate"]
happy_norm_dep <- Sleep$Happiness[Sleep$DepressionStatus == "normal"]
t.test(happy_mod_dep, happy_norm_dep)
# Q6 
sleep_quality_all <- Sleep$AverageSleep[Sleep$AllNighter >= 1]
sleep_quality_none <- Sleep$AverageSleep[Sleep$AllNighter == 0]
t.test(sleep_quality_all, sleep_quality_none)
# Q7 
stress_alcohol <- Sleep$StressScore[Sleep$AlcoholUse == "Heavy"]
stress_no_alcohol <- Sleep$StressScore[Sleep$AlcoholUse == "Abstain"]
t.test(stress_no_alcohol, stress_alcohol)
# Q8 
male_drinks <- Sleep$Drinks[Sleep$Gender == 0]
female_drinks <- Sleep$Drinks[Sleep$Gender == 1]
t.test(male_drinks, female_drinks)
# Q9 
weekday_bed_high <- Sleep$WeekdayBed[Sleep$Stress == "high"]
weekday_bed_normal <- Sleep$WeekdayBed[Sleep$Stress == "normal"]
t.test(weekday_bed_high, weekday_bed_normal)
# Q10 
sleep_av_first_two <- Sleep$WeekendSleep[Sleep$ClassYear == 1 + 2]
sleep_av_other <- Sleep$WeekendSleep[Sleep$ClassYear != 1 + 2]
t.test(sleep_av_first_two, sleep_av_other)