Background:
As a university student, I have been assigned by my professor to write a statistical report on college data sourced from Lock5 to demonstrate my understanding of hypothesis test, confidence intervals, and general statistical methods. Using the RStudio IDE, this report will address 10 statistical questions, utilizing RStudio’s calculation tools and visual aids. Lastly, I will analyze each statistical method individually before summarizing my conclusions.
My report will center around the significance of differences among populations. I will use null/alternative hypothesis, confidence intervals, t-test, and box plots so show my findings.
Purpose:
Gain experience in RStudio programming
Reinforce my knowledge of statistics
Analyze the data provided in a creative and insightful way
Research Questions:
Research questions were taken from D2L example, as suggested by the professor. Note that question #9 was changed due to difficulty in programming the data.
Is there a significant difference in the average GPA between male and female college students?
Is there a significant difference in the average number of early classes between the first two class years and other class years?
Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?
Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?
Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?
Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?
Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?
Is there a significant difference in the average number of drinks per week between students of different genders?
Is there a significant difference in the average GPA between students who report getting less than 7 hours of sleep on weekdays and those who report getting 7 or more hours of sleep?
Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?
Variables and Observations:
253 observations
27 variables
Definitions of Variables used in Analysis:
(rest of variable definitions can be found at https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf)
Gender: 1=male, 0=female
ClassYear: Year in school, 1=first year, …, 4=senior
LarkOwl: Early riser or night owl? Lark, Neither, or Owl
NumEarlyClass: Number of classes per week before 9 am
EarlyClass: Indicator for any early classes
GPA: Grade point average (0-4 scale)
ClassesMissed: Number of classes missed in a semester
CognitionZscore: Z-score on a test of cognitive skills
PoorSleepQuality: Measure of sleep quality (higher values are poorer sleep)
DepressionScore: Measure of degree of depression
StressScore: Measure of amount of stress
DepressionStatus: Coded depression score: normal, moderate, or severe
Stress: Coded stress score: normal or high
Happiness: Measure of degree of happiness
AlcoholUse: Self-reported: Abstain, Light, Moderate, or Heavy
Drinks: Number of alcoholic drinks per week
WeekendSleep: Average weekend bedtime (24.0=midnight)
AverageSleep: Average hours of sleep for all days
AllNighter: Had an all-nighter this semester? 1=yes, 0=no
As directed by my professor, I got the data from the Lock5 data set. According to the Lock5 data guide, ‘The data were obtained from a sample of students who did skills tests to measure cognitive function, completed a survey that asked many questions about attitudes and habits, and kept a sleep diary to record time and quality of sleep over a two week period.’ Lock5 sourced the data from a research study called “Class Start Times, Sleep, and Academic Performance in College: A Path Analysis”.
2-sample t-test
T-value
Degrees of freedom
P-value
null vs. alternative hypothesis
Confidence interval
Mean
Box Plot
##
## lessR 4.3.8 feedback: gerbing@pdx.edu
## --------------------------------------------------------------
## > d <- Read("") Read text, Excel, SPSS, SAS, or R data file
## d is default data frame, data= in analysis routines optional
##
## Many examples of reading, writing, and manipulating data,
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
## Enter: browseVignettes("lessR")
##
## View lessR updates, now including time series forecasting
## Enter: news(package="lessR")
##
## Interactive data analysis
## Enter: interact()
##
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
##
## sort_by
## Gender ClassYear LarkOwl NumEarlyClass EarlyClass GPA ClassesMissed
## 1 0 4 Neither 0 0 3.60 0
## 2 0 4 Neither 2 1 3.24 0
## 3 0 4 Owl 0 0 2.97 12
## 4 0 1 Lark 5 1 3.76 0
## 5 0 4 Owl 0 0 3.20 4
## 6 1 4 Neither 0 0 3.50 0
## CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1 -0.26 4 4 3 8
## 2 1.39 6 1 0 3
## 3 0.38 18 18 18 9
## 4 1.39 9 1 4 6
## 5 1.22 9 7 25 14
## 6 -0.04 6 14 8 28
## DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1 normal normal normal 15 28 Moderate 10
## 2 normal normal normal 4 25 Moderate 6
## 3 moderate severe normal 45 17 Light 3
## 4 normal normal normal 11 32 Light 2
## 5 normal severe normal 46 15 Moderate 4
## 6 moderate moderate high 50 22 Abstain 0
## WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1 25.75 8.70 7.70 25.75 9.50 5.88
## 2 25.70 8.20 6.80 26.00 10.00 7.25
## 3 27.44 6.55 3.00 28.00 12.59 10.09
## 4 23.50 7.17 6.77 27.00 8.00 7.25
## 5 25.90 8.67 6.09 23.75 9.50 7.00
## 6 23.80 8.95 9.05 26.00 10.75 9.00
## AverageSleep AllNighter
## 1 7.18 0
## 2 6.93 0
## 3 5.02 0
## 4 6.90 0
## 5 6.35 0
## 6 9.04 0
##
## Welch Two Sample t-test
##
## data: GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## 0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1
## 3.324901 3.123725
t is 3.91, indicating a large difference
df is 201, indicating a large sample size
p-value is 0.000124, indicating a very low chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between 0.0998 and 0.303
Male average GPA is 3.32; Female average GPA is 3.12
There is a significant difference, as shown by the low p-value, high df, and high t. The box plot above illustrates a healthy gap in median GPA and interquartile ranges.
##
## Welch Two Sample t-test
##
## data: NumEarlyClass by ClassGroup
## t = 4.1813, df = 250.69, p-value = 0.00004009
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
## 0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears mean in group OtherYears
## 2.070423 1.306306
t is 4.18, indicating a large difference
df is 251, indicating a large sample size
p-value is 0.0000401, indicating a very low chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between 0.404 and 1.12
Underclassmen average early classes is 2.07; Upperclassmen average early classes is 1.31
There is a significant difference, as shown by the low p-value, high df, and high t.
##
## Welch Two Sample t-test
##
## data: CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
## -0.1893561 0.4465786
## sample estimates:
## mean in group Lark mean in group Owl
## 0.09024390 -0.03836735
t is 0.806, indicating a small difference
df is 75.3, indicating a relatively large sample size
p-value is 0.423, indicating a 42.3% chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -0.189 and 0.447
‘Lark’ average cognition score is 0.0902; ‘Owl’ average cognition score is -0.0384
Larks do not have significantly better cognitive skills compared to owls, as shown by the low t, high p-value, and high df. In addition, the confidence interval includes zero, so we cannot rule out null hypothesis. The box plot above shows a very slight difference in median and interquartile ranges of cognition scores between larks and owls.
##
## Welch Two Sample t-test
##
## data: ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -0.2233558 1.5412830
## sample estimates:
## mean in group 0 mean in group 1
## 2.647059 1.988095
t is 1.48, indicating a moderate difference
df is 153, indicating a large sample size
p-value is 0.142, indicating a 14.2% of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -0.223 and 1.54
Non-early class students average missed classes is 2.65; Early class students average missed classes is 1.99
There is no significant difference, as shown by the high p-value, high df, and relatively low t.
##
## Welch Two Sample t-test
##
## data: Happiness by DepressionGroup
## t = -5.6339, df = 55.594, p-value = 0.0000006057
## alternative hypothesis: true difference in means between group ModerateOrSevere and group Normal is not equal to 0
## 95 percent confidence interval:
## -7.379724 -3.507836
## sample estimates:
## mean in group ModerateOrSevere mean in group Normal
## 21.61364 27.05742
t is -5.63, indicating a large difference
df is 55.6, indicating a moderate sample size
p-value is 0.000000606, indicating a very low chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -7.38 and -3.51
Average happiness of depressed students is 21.6; Average happiness of normal students is 27.1
There is a significant difference, as shown by the low p-value and high t magnitude.
##
## Welch Two Sample t-test
##
## data: PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -1.9456958 0.1608449
## sample estimates:
## mean in group 0 mean in group 1
## 6.136986 7.029412
t is -1.71, indicating a moderate difference
df is 44.7, indicating a moderate sample size
p-value is 0.0948, indicating a 9.48% of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -1.95 and 0.161
Average sleep of non-all-nighter is 6.14; Average sleep of all-nighter is 7.03
There is no significant difference, as shown by the relatively high p-value. Furthermore, the 95% confidence interval includes zero.
##
## Welch Two Sample t-test
##
## data: StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
## -6.261170 3.327346
## sample estimates:
## mean in group Abstain mean in group Heavy
## 8.970588 10.437500
t is -0.626, indicating a small difference
df is 28.7, indicating a relatively small sample size
p-value is 0.536, indicating a 53.6% chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -6.26 and 3.33
Average stress of abstaining students is 6.14; Average stress of heavy users is 7.03
There is no significant difference, as shown by the high p-value and low t. Furthermore, the 95% confidence interval includes zero.
##
## Welch Two Sample t-test
##
## data: Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 0.000000007002
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1
## 4.238411 7.539216
t is -6.16, indicating a large difference
df is 143, indicating a large sample size
p-value is 0.00000000700, indicating a very low chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -4.36 and -2.24
Average drinks for female students is 4.24; Average drinks for male students is 7.54
There is significant difference, as shown by the low p-value and high t magnitude, and high df.
##
## Welch Two Sample t-test
##
## data: GPA by SleepCategory
## t = 0.28997, df = 59.95, p-value = 0.7728
## alternative hypothesis: true difference in means between group 7 or more hours and group Less than 7 hours is not equal to 0
## 95 percent confidence interval:
## -0.1292119 0.1730252
## sample estimates:
## mean in group 7 or more hours mean in group Less than 7 hours
## 3.247864 3.225957
t is 0.290, indicating a small difference
df is 60.0, indicating a moderate sample size
p-value is 0.773, indicating a 77.3% chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -0.129 and 0.173
Average GPA for greater than 7 hours of sleep students is 3.25; Average GPA for less than 7 hours of sleep students is 3.23
There is no significant difference, as shown by the high p-value and low t. Moreover, the 95% confidence interval includes zero.
##
## Welch Two Sample t-test
##
## data: WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group First Two Years and group Other Years is not equal to 0
## 95 percent confidence interval:
## -0.3497614 0.3331607
## sample estimates:
## mean in group First Two Years mean in group Other Years
## 8.213592 8.221892
t is -0.0479, indicating a small difference
df is 237, indicating a large sample size
p-value is 0.962, indicating a 96.2% chance of observing the results, assuming the null hypothesis is true
There is a 95% chance that the true difference lies between -0.350 and 0.333
Average weekend sleep for underclassmen is 8.21; Average weekend sleep for upperclassmen is 8.22
There is no significant difference, as shown by the extremely high p-value, low t, and high df. Moreover, the 95% confidence interval includes zero.
The goal of this report is to assess the significance of differences across 10 research questions involving college student data. I determined that questions #1, #2, #5, and #8 displayed significant differences between respective populations. Questions #3, #4, #6, #7, #9, and #10 did not display a significant difference.
This analysis is important because it reveals which actions and behaviors will lead to student success and showing other actions and behaviors that may not matter. Also, it may inspire additional statistical questions.
For instance, question #1 suggests that men have higher college GPA than women, and statisticians may decide to explore that data deeper and uncover why men have greater academic success. Question #2 suggests that underclassmen take more early classes than upperclassmen, and if someone were interested, they could survey students and counselors on why that is the case. Question #5 suggests that depressive students are less happy than normal students, which seems obvious, but it does confirm that the happiness level test seems to be accurate. Question #8 suggests that male college students drink more than female while having higher average GPA, so a reasonable follow up would be to research if drinking is related to GPA.
Onyper SV, Thacher PV, Gilbert JW, Gradess SG. Class start times, sleep, and academic performance in college: a path analysis. Chronobiol Int. 2012 Apr;29(3):318-35. doi: 10.3109/07420528.2012.655868. PMID: 22390245.