We use the data from lock5stat.com
I propose the following 10 questions
Is there a significant difference in the average GPA between male and female college students?
Is there a significant different in the average number of early classes between the first two class years and the other class years?
Do students who identify as “larks” have significantly better cognitive skills (cognitiion z-score) compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average weekday bedtime between students with high and low stress (Stress=High vs. Stress=Normal)?
Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
We will explore the questions here.
sleep= read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(sleep)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
sleep$LarkOwlGroup <- ifelse(sleep$LarkOwl %in% c("Lark", "Owl"), 1,2)
sleep$LarkOwlGroup
## [1] 2 2 1 1 1 2 1 1 2 2 2 2 2 2 2 1 2 1 2 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2
## [38] 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 1 2 1 2 2 2 2 2 1 1
## [75] 2 1 2 2 1 1 2 2 1 2 1 2 2 1 1 2 1 2 2 2 2 1 2 2 1 1 1 2 1 2 2 1 1 2 2 2 1
## [112] 1 2 2 2 1 1 1 2 2 2 2 1 2 2 1 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 1 2 2 1 1 1 1
## [149] 1 2 2 2 2 2 2 1 2 2 2 1 2 1 1 1 1 1 2 1 1 2 2 2 1 2 1 2 1 2 2 2 2 2 2 2 2
## [186] 2 1 2 2 1 1 1 1 1 2 2 2 2 1 1 2 2 1 2 1 2 2 1 2 1 2 2 2 1 1 2 2 1 2 1 1 2
## [223] 2 1 2 1 2 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 2 2 1 1 2 1 2 2 2 2 2
sleep$AlcoholUseGroup <- ifelse(sleep$AlcoholUse %in% c("Abstain", "Light"), 1, 2)
sleep$AlcoholUseGroup
## [1] 2 2 1 1 2 1 2 1 1 2 2 2 2 1 1 1 2 2 2 2 1 1 2 1 1 1 2 2 1 2 2 1 1 2 1 1 1
## [38] 1 2 2 2 2 1 1 1 2 1 1 2 1 2 1 1 2 2 1 2 2 1 2 1 1 2 2 1 1 1 2 2 1 1 1 2 2
## [75] 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 2 2 2 1 2 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 1
## [112] 1 1 2 2 2 2 1 1 2 2 2 1 2 1 1 1 2 2 1 2 1 2 1 2 1 2 1 2 1 1 2 1 1 1 1 2 2
## [149] 2 2 1 2 2 1 2 2 2 1 1 2 2 1 2 2 1 1 2 2 2 1 1 1 2 2 2 2 2 2 2 1 2 1 1 2 2
## [186] 2 2 1 2 2 2 1 2 2 1 2 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1 1 2 1 1 1 1 1 2 2 2 2
## [223] 2 1 1 2 2 2 1 1 1 1 2 1 2 2 2 1 2 2 1 1 2 2 2 2 1 1 1 1 2 2 2
sleep$DepressionStatusGroup <- ifelse(sleep$AlcoholUse %in% c("Normal", "Moderate"), 1, 2)
sleep$DepressionStatusGroup
## [1] 1 1 2 2 1 2 1 2 2 1 1 1 1 2 2 2 1 2 1 1 2 2 1 2 2 2 1 1 2 1 1 2 2 1 2 2 2
## [38] 2 1 1 1 1 2 2 2 1 2 2 1 2 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 2 2 1 1 2 2 2 1 1
## [75] 2 2 2 2 1 1 2 1 2 2 2 2 2 2 2 1 1 2 2 1 2 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2
## [112] 2 2 1 1 1 1 2 2 1 1 1 2 1 2 2 2 1 1 2 1 2 1 2 2 2 1 2 1 2 2 1 2 2 2 2 1 1
## [149] 1 1 2 1 1 2 1 1 1 2 2 1 1 2 1 2 2 2 1 1 1 2 2 2 1 2 1 1 1 1 1 2 1 2 2 1 1
## [186] 1 1 2 1 1 1 2 2 1 2 1 2 1 2 1 1 2 2 1 2 2 1 2 1 1 2 2 1 2 2 2 2 2 2 1 1 2
## [223] 1 2 2 2 2 1 2 2 2 2 1 2 1 2 1 2 1 1 2 2 1 1 1 1 2 2 2 2 1 1 1
cor(sleep$Gender, sleep$GPA, use = "complete.obs")
## [1] -0.2445769
scatter.smooth(sleep$GPA, main = "Average GPA", xlab= "Students", ylab= "GPA" )
hist(sleep$Gender, main="Gender", xlab="Female Male",ylab="Number of Students")
There is a negative correlation between the average GPA and gender of
students. Males have a lower average GPA than females.
cor(sleep$ClassYear, sleep$NumEarlyClass, use = "complete.obs")
## [1] -0.2687247
scatter.smooth(sleep$ClassYear, main = "Class Year and Early Classes Taken", xlab= "Early Classes Taken", ylab= "Class Year" )
There is a negative correlation between class year and choosing early
classes.
cor(sleep$LarkOwlGroup, sleep$CognitionZscore, use = "complete.obs")
## [1] -0.02134276
hist(sleep$LarkOwlGroup, main= "Owl Vs. Lark", xlab= "Owl Lark", ylab="Students")
hist(sleep$CognitionZscore, main= "Cognition Z Score", xlab= "Z Score", ylab="Students")
cor(sleep$ClassesMissed, sleep$NumEarlyClass, use = "complete.obs")
## [1] -0.08284114
scatter.smooth(sleep$ClassesMissed, main = "Students Missed Class", xlab= "Students", ylab= "Classes Missed")
scatter.smooth(sleep$NumEarlyClass, main = "Students Early to Class", xlab= "Students", ylab= "Early Class")
There is a high negative correlation between students who missed class
and those who had atleast one early class. Those who missed more classes
had less of a chance of going to class early.
cor(sleep$Happiness, sleep$DepressionStatusGroup, use = "complete.obs")
## [1] 0.01485652
hist(sleep$Happiness, main = "Happiness vs Depression", xlab="Happiness", ylab="Students")
hist(sleep$DepressionStatusGroup, main= "Depression Status", xlab="Moderate or Normal Depression", ylab="Students")
Correlation is 0.01485652. Most of the students are pretty happy when
there is an equal split of students who say they are moderately
depressed and have normal depression status.
cor(sleep$PoorSleepQuality, sleep$AllNighter, use = "complete.obs")
## [1] 0.1044542
scatter.smooth(sleep$PoorSleepQuality, main = "Sleep Quality", xlab= "Students", ylab= "Poor Sleep Quality")
hist(sleep$AllNighter, main = "Pulling an All Nighter", xlab= "Pulled an All Nighter", ylab= "Students")
The correlation for sleep quality and if you have pulled an all nighter
is very small. So they are not related
cor(sleep$AlcoholUseGroup, sleep$StressScore, use = "complete.obs")
## [1] 0.01555206
hist(sleep$AlcoholUseGroup, main = "Alcohol Use", xlab="Not Drink and Drinking", ylab="Students")
hist(sleep$StressScore, main="Stress Levels", xlab="Amount of Stress", ylab="Students")
The correlation between Alcohol Use and Stress is extremly low. The two
are unrelated
cor(sleep$Drinks, sleep$Gender, use = "complete.obs")
## [1] 0.3961698
hist(sleep$Drinks, main= "Number of Drinks per Week", xlab="Number of Drinks", ylab= "Students")
hist(sleep$Gender, main="Gender", xlab="Female Male", ylab="Students")
There is a moderate positive correlation between the number of drinks
per week and gender. ### Q9: Is there a significant difference in the
average weekday bedtime between students with high and low stress
(Stress=High vs. Stress=Normal)?
cor(sleep$WeekdaySleep, sleep$StressScore)
## [1] -0.09220388
hist(sleep$WeekdaySleep, main="Average Hours of Sleep on Weekdays", ylab="Students", xlab= "Hours of Sleep")
hist(sleep$StressScore, main= "Amount of Stress", xlab="Stress Score", ylab="Students")
The correlation between weekday bedtime and students with high and low
stress is low. The time you go to bed doesn’t affect ones stress too
much.
cor(sleep$WeekdayBed, sleep$ClassYear)
## [1] -0.002674365
hist(sleep$WeekdayBed, main= "Average Weekday Bedtime", ylab="Students", xlab="Time(24 is Midnight)")
hist(sleep$ClassYear, main="Year of Schooling", xlab="Year", ylab="Students")
There is no correlation between sleep on the weekends and the year of
schooling you’re in. The correlation is very close to zero.