This project uses data from the “SleepStudy” dataset that was provided. Using this dataset, the provided research questions will be analyzed using concepts learned in class.
The 10 questions to be analyzed are as follows: 1) Is there a significant difference in the average GPA between male and female college students? 2) Is there a significant difference in the average number of early classes between the first two class years and other class years? 3) Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”? 4) Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)? 5) Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status? 6) Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)? 7) Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use? 8) Is there a significant difference in the average number of drinks per week between students of different genders? 9) Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)? 10) Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
Here, we will explore 10 of the above questions in detail.
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
male = c(SleepStudyData$GPA[SleepStudyData$Gender == 1])
female = c(SleepStudyData$GPA[SleepStudyData$Gender == 0])
t.test(male, female, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: male and female
## t = -3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.30252780 -0.09982254
## sample estimates:
## mean of x mean of y
## 3.123725 3.324901
result = t.test(male, female, var.equal = FALSE)
result$p.value
## [1] 0.000124298
Welch’s Two Sample t-test shows men had a mean GPA of 3.12 and females had a mean GPA of 3.32. The P value is < 0.05, so the difference is considered significant.
Year12 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(1,2)])
Year34 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(3,4)])
t.test(Year12, Year34, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: Year12 and Year34
## t = 4.1813, df = 250.69, p-value = 4.009e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.4042016 1.1240309
## sample estimates:
## mean of x mean of y
## 2.070423 1.306306
result = t.test(Year12, Year34, var.equal=FALSE)
result$p.value
## [1] 4.009356e-05
Using Welch’s Two Sample t-test, the mean of early classes taken by students in their first 2 years is 2.07 and the mean of early classes taken by other students is 1.31. The P value is far less than 0.05, so the difference is considered very significant.
larks <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Lark"])
owls <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Owl"])
t.test(larks, owls, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: larks and owls
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean of x mean of y
## 0.09024390 -0.03836735
result = t.test(larks, owls, var.equal = FALSE)
result$p.value
## [1] 0.4229482
Using Welch’s Two Sample t-test, the mean cognition Z score of students who identify as larks is 0.09, while the mean cognition Z score of students who identify as owls is -0.04. The P value is greater than 0.05, so the difference is not significant.
Early = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 1])
NotEarly = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 0])
t.test(Early,NotEarly, val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: Early and NotEarly
## t = -1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.5412830 0.2233558
## sample estimates:
## mean of x mean of y
## 1.988095 2.647059
result = t.test(Early,NotEarly, val.equal=FALSE)
result$p.value
## [1] 0.1421377
Using Welch’s Two Sample t-test, the mean number of classes missed for students with an early class is 1.99, and the mean number of classes missed for students with no early classes is 2.65. The p value is greater than 0.05, so the difference is not significant.
high = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus %in% c("moderate","severe")])
low = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus == "normal"])
t.test(high,low, val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: high and low
## t = -5.6339, df = 55.594, p-value = 6.057e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.379724 -3.507836
## sample estimates:
## mean of x mean of y
## 21.61364 27.05742
result = t.test(high,low, val.equal=FALSE)
result$p.value
## [1] 6.056559e-07
Welch’s Two Sample t-test shows that the mean happiness score of students with moderate or severe depression is 21.61 and the mean happiness score of students with normal levels of depression is 27.06. The p test score is below 0.05, showing a significantly lower happiness score for students with moderate or severe depression.
AllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 1])
NoAllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 0])
t.test(AllNighter,NoAllNighter,val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: AllNighter and NoAllNighter
## t = 1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1608449 1.9456958
## sample estimates:
## mean of x mean of y
## 7.029412 6.136986
result = t.test(AllNighter,NoAllNighter,val.equal=FALSE)
result$p.value
## [1] 0.09478991
Welch’s Two Sample t-test shows that the mean sleep score for students who reported to having at least one all nighter was 7.03 and the mean sleep score for students who hadn’t is 6.14. This shows that while students who had not pulled an all nighter had a better sleep score, the difference was not significant.
NoAlch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Abstain"])
Alch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Heavy"])
t.test(NoAlch,Alch,val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: NoAlch and Alch
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.261170 3.327346
## sample estimates:
## mean of x mean of y
## 8.970588 10.437500
result = t.test(NoAlch,Alch,val.equal=FALSE)
result$p.value
## [1] 0.5362324
Welch’s Two Sample t-test shows that the mean stress score for students who abstain from alcohol is 8.97 and the mean stress score for heavy drinkers is 10.44. While this shows the stress score for heavy drinkers is higher, the p score is greater than 0.05, showing the difference is not significant.
male = c(SleepStudyData$Drinks[SleepStudyData$Gender == 1])
female = c(SleepStudyData$Drinks[SleepStudyData$Gender == 0])
t.test(male,female,val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: male and female
## t = 6.1601, df = 142.75, p-value = 7.002e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.241601 4.360009
## sample estimates:
## mean of x mean of y
## 7.539216 4.238411
result = t.test(male,female,val.equal=FALSE)
result$p.value
## [1] 7.001743e-09
Welch’s Two Sample t-test shows that the mean number of drinks per week for male students is 7.54 and the mean for females is 4.24. The p value is well below 0.05, showing a very significant difference in the number of drinks had per week between males and females.
HStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "high"])
NStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "normal"])
t.test(HStress,NStress,val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: HStress and NStress
## t = -1.0746, df = 87.048, p-value = 0.2855
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4856597 0.1447968
## sample estimates:
## mean of x mean of y
## 24.71500 24.88543
result = t.test(HStress,NStress,val.equal=FALSE)
result$p.value
## [1] 0.2855177
Welch’s Two Sample t-test shows that the mean bed time for students with high stress is 24.72 (about 12:40am) and the mean bed time for students with normal stress is 24.89 (about 12:50am). Although students with high stress go to bed slightly earlier on weekdays, the p value of greater than 0.05 shows that the difference is not significant.
First2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(1,2)])
Last2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(3,4)])
t.test(First2,Last2,val.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: First2 and Last2
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3497614 0.3331607
## sample estimates:
## mean of x mean of y
## 8.213592 8.221892
result = t.test(First2,Last2,val.equal=FALSE)
result$p.value
## [1] 0.9618461
Welch’s Two Sample t-test shows that the mean hours of sleep a student gets on the weekends in their first 2 years is 8.21 hours, while students in their last 2 years get a mean of 8.22 hours of sleep on the weekends. The p value of well over 0.05 shows that there is a very insignificant difference in the amount of sleep students get on the weekends when divided by class year.
In conclusion, methods learned in the class material proved very useful in comparing different sets of data. This project helped me practice learned concepts and utilize problem solving skills. This project also helped me become more familiar with the functionality of posit.
knitr::opts_chunk$set(echo = TRUE)
SleepStudyData = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(SleepStudyData)
male = c(SleepStudyData$GPA[SleepStudyData$Gender == 1])
female = c(SleepStudyData$GPA[SleepStudyData$Gender == 0])
t.test(male, female, var.equal = FALSE)
result = t.test(male, female, var.equal = FALSE)
result$p.value
Year12 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(1,2)])
Year34 = c(SleepStudyData$NumEarlyClass[SleepStudyData$ClassYear %in% c(3,4)])
t.test(Year12, Year34, var.equal=FALSE)
result = t.test(Year12, Year34, var.equal=FALSE)
result$p.value
larks <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Lark"])
owls <- c(SleepStudyData$CognitionZscore[SleepStudyData$LarkOwl == "Owl"])
t.test(larks, owls, var.equal = FALSE)
result = t.test(larks, owls, var.equal = FALSE)
result$p.value
Early = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 1])
NotEarly = c(SleepStudyData$ClassesMissed[SleepStudyData$EarlyClass == 0])
t.test(Early,NotEarly, val.equal=FALSE)
result = t.test(Early,NotEarly, val.equal=FALSE)
result$p.value
high = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus %in% c("moderate","severe")])
low = c(SleepStudyData$Happiness[SleepStudyData$DepressionStatus == "normal"])
t.test(high,low, val.equal=FALSE)
result = t.test(high,low, val.equal=FALSE)
result$p.value
AllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 1])
NoAllNighter = c(SleepStudyData$PoorSleepQuality[SleepStudyData$AllNighter == 0])
t.test(AllNighter,NoAllNighter,val.equal=FALSE)
result = t.test(AllNighter,NoAllNighter,val.equal=FALSE)
result$p.value
NoAlch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Abstain"])
Alch = c(SleepStudyData$StressScore[SleepStudyData$AlcoholUse == "Heavy"])
t.test(NoAlch,Alch,val.equal=FALSE)
result = t.test(NoAlch,Alch,val.equal=FALSE)
result$p.value
male = c(SleepStudyData$Drinks[SleepStudyData$Gender == 1])
female = c(SleepStudyData$Drinks[SleepStudyData$Gender == 0])
t.test(male,female,val.equal=FALSE)
result = t.test(male,female,val.equal=FALSE)
result$p.value
HStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "high"])
NStress = c(SleepStudyData$WeekdayBed[SleepStudyData$Stress == "normal"])
t.test(HStress,NStress,val.equal=FALSE)
result = t.test(HStress,NStress,val.equal=FALSE)
result$p.value
First2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(1,2)])
Last2 = c(SleepStudyData$WeekendSleep[SleepStudyData$ClassYear %in% c(3,4)])
t.test(First2,Last2,val.equal=FALSE)
result = t.test(First2,Last2,val.equal=FALSE)
result$p.value