This document provides an analysis of the data provided at https://www.lock5stat.com/datasets3e/SleepStudy.csv The data contains information about college students sleep habits and many many data points that affect or are effected by sleep. The population includes 253 students with 27 different data points including things such as gender, GPA, sleep quality, alcohol use etc. The goal is to explore different aspects of students sleeping habits and how their sleep is effected by other aspects of thier lives and how their sleep can affect apects of their lives related to school.
I propose the following 10 questions based on my understanding of the data.
We will explore the questions in detail and summarize the results.
college = read.csv("https://www.lock5stat.com/datasets3e/SleepStudy.csv")
head(college)
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
The t-test gives a p-value of 0.0001243, which is below the standard .05 significance level suggesting there is a significant difference in GPA given the students gender. This can also be visualized in the box plot.
Based on the aggregate data of mean happiness based on depression status, there appears to be a link between depression and happiness. The students who report the highest happiness claim to have normal depression while the students reporting the lowest happiness scores report having severe depression.
Based on the aggregate data for Class year and GPA, there appears to be some relationship between them. As students get into higher grade levels their GPA’s tend to go down after thier first year and then slowly go back up. They also have a correlation value of -0.1548046 showing a weak relationship. This could be caused by having easier classes their first year compared to their second year. This would explain the large dip in GPA after year one. A plausible explanation for why GPA’s go up after year two is that students have had time to adapt to college and learned new ways to study and be successful in class.
Based on the t-test, the data has a p-value of 0.1421 which is greater than .05 so we cannnot conclude there is any significant difference in the number of missed classes based on students having early classes or not. Looking at the aggregate data there is a small difference in the average missed classes for the group however this could be due to random variation rather than having an early class or not.
The chart showing anxiety score and average sleep shows that students with higher anxiety scores do tend to get less sleep than those with lower anxiety. This is reasonable as students who have high anxiety are more likely to have issues falling asleep and staying asleep throughout then night.
The t-test gives a p-value of 0.09479 which is bigger than .05 but still smaller than .1 so there may be a slight trend of having worse sleep quality when staying up all night. This means there is not a statistically significant difference however there is a small difference. This can also be visualized looking at the box plot where average poor sleep quality score is slightly higher for students who stay up all night.
The bar chart shows little to no change in stress score between students who drink at various levels. It appears as though students who report heavy drinking have very slightly higher stress scores but certainly not enough of a difference to make any conclusions about the relationship between drinking and overall stress.
Given a correlation of -0.2693046, there seems to be a correlation between GPA and number of drinks students have. The charts trend line shows pretty clearly there is a relationship between them. Students who report drinking more tend to have lower GPA’s than those who dont. This makes sense as students who drink more are likely spending more time partying rather than studying and doing homework.
The aggregated data shows that on average, the closer you get to graduation, the less early classes you have to take. The trend line on the chart also shows this steady decline in early classes the older you get. This matches my experience and makes sense as you start taking more major specific classes you end up taking less early classes as many of the early classes offered are basic generals rather than upper level classes.
The chart and trend line shows a clear relationship between GPA and students’ cognitive Z score. Students with higher GPA’s typically score higher on cognitive tests which makes perfect sense as students who are smarter should generally be able to hold a higher GPA than someone with a lower cognitive score.
t.test(GPA ~ Gender, data = college)
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
boxplot(GPA ~ Gender, data = college,
main = "GPA by Gender",
xlab = "Gender", ylab = "GPA",
col = c("lightblue", "lightpink"))
aggregate(Happiness ~ DepressionStatus, data = college, mean)
## DepressionStatus Happiness
## 1 moderate 23.08824
## 2 normal 27.05742
## 3 severe 16.60000
aggregate(GPA ~ ClassYear, data = college, mean)
## ClassYear GPA
## 1 1 3.527872
## 2 2 3.127579
## 3 3 3.213889
## 4 4 3.231579
cor(college$ClassYear, college$GPA, use = "complete.obs")
## [1] -0.1548046
t.test(ClassesMissed ~ EarlyClass, data = college)
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
aggregate(ClassesMissed ~ EarlyClass, data = college, mean)
## EarlyClass ClassesMissed
## 1 0 2.647059
## 2 1 1.988095
plot(AverageSleep ~ AnxietyScore, data = college,
main = "Average Sleep vs Anxiety Score", xlab = "Anxiety Score", ylab = "Average Sleep")
abline(lm(AverageSleep ~ AnxietyScore, data = college), col = "red")
t.test(PoorSleepQuality ~ AllNighter, data = college)
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
boxplot(PoorSleepQuality ~ AllNighter, data = college, main="Sleep Quality Based on Staying Up All Night", xlab="All-Nighter (0 = No, 1 = Yes)", ylab="Poor Sleep Quality Score")
barplot(tapply(college$StressScore, college$AlcoholUse, mean, na.rm = TRUE),
main = "Average Stress Score by Alcohol Use",
xlab = "Alcohol Use", ylab = "Average Stress Score",
col = "lightblue")
cor(college$GPA, college$Drinks, use = "complete.obs")
## [1] -0.2693046
plot(GPA ~ Drinks, data = college,
main = "GPA vs. Drinks per Week", xlab = "Drinks per Week", ylab = "GPA")
abline(lm(GPA ~ Drinks, data = college), col = "red")
aggregate(NumEarlyClass ~ ClassYear, data = college, mean)
## ClassYear NumEarlyClass
## 1 1 2.361702
## 2 2 1.926316
## 3 3 1.425926
## 4 4 1.192982
plot(NumEarlyClass ~ ClassYear, data = college,
main = "Number of Early Classes by Class Year",
xlab = "Class Year",
ylab = "Number of Early Classes",
pch = 19, col = "blue")
abline(lm(NumEarlyClass ~ ClassYear, data = college), col = "red", lwd = 2)
plot(CognitionZscore ~ GPA, data = college,
main = "Number of Early Classes by Class Year",
xlab = "Class Year",
ylab = "Number of Early Classes",
pch = 19, col = "blue")
abline(lm(CognitionZscore ~ GPA, data = college), col = "red", lwd = 2)