1. Introduction

Background:

As a university student, I have been assigned by my professor to write a statistical report on college data sourced from Lock5 to demonstrate my understanding of hypothesis test, confidence intervals, and general statistical methods. Using the RStudio IDE, this report will address 10 statistical questions, utilizing RStudio’s calculation tools and visual aids. Lastly, I will analyze each statistical method individually before summarizing my conclusions.

My report will center around the significance of differences among populations. I will use null/alternative hypothesis, confidence intervals, t-test, and box plots so show my findings.

Purpose:

Research Questions:

Research questions were taken from D2L example, as suggested by the professor. Note that question #9 was changed due to difficulty in programming the data.

2. Data

Variables and Observations:

Definitions of Variables used in Analysis:

(rest of variable definitions can be found at https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf)

Gender: 1=male, 0=female

ClassYear: Year in school, 1=first year, …, 4=senior

LarkOwl: Early riser or night owl? Lark, Neither, or Owl

NumEarlyClass: Number of classes per week before 9 am

EarlyClass: Indicator for any early classes

GPA: Grade point average (0-4 scale)

ClassesMissed: Number of classes missed in a semester

CognitionZscore: Z-score on a test of cognitive skills

PoorSleepQuality: Measure of sleep quality (higher values are poorer sleep)

DepressionScore: Measure of degree of depression

StressScore: Measure of amount of stress

DepressionStatus: Coded depression score: normal, moderate, or severe

Stress: Coded stress score: normal or high

Happiness: Measure of degree of happiness

AlcoholUse: Self-reported: Abstain, Light, Moderate, or Heavy

Drinks: Number of alcoholic drinks per week

WeekendSleep: Average weekend bedtime (24.0=midnight)

AverageSleep: Average hours of sleep for all days

AllNighter: Had an all-nighter this semester? 1=yes, 0=no

Data Collection:

As directed by my professor, I got the data from the Lock5 data set. According to the Lock5 data guide, ‘The data were obtained from a sample of students who did skills tests to measure cognitive function, completed a survey that asked many questions about attitudes and habits, and kept a sleep diary to record time and quality of sleep over a two week period.’ Lock5 sourced the data from a research study called “Class Start Times, Sleep, and Academic Performance in College: A Path Analysis”.

Statistical Methods:

  • 2-sample t-test

  • T-value

  • Degrees of freedom

  • P-value

  • null vs. alternative hypothesis

  • Confidence interval

  • Mean

  • Box Plot

3. Analysis

## 
## lessR 4.3.8                         feedback: gerbing@pdx.edu 
## --------------------------------------------------------------
## > d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
##   d is default data frame, data= in analysis routines optional
## 
## Many examples of reading, writing, and manipulating data, 
## graphics, testing means and proportions, regression, factor analysis,
## customization, and descriptive statistics from pivot tables
##   Enter: browseVignettes("lessR")
## 
## View lessR updates, now including time series forecasting
##   Enter: news(package="lessR")
## 
## Interactive data analysis
##   Enter: interact()
## 
## Attaching package: 'lessR'
## The following object is masked from 'package:base':
## 
##     sort_by
##   Gender ClassYear LarkOwl NumEarlyClass EarlyClass  GPA ClassesMissed
## 1      0         4 Neither             0          0 3.60             0
## 2      0         4 Neither             2          1 3.24             0
## 3      0         4     Owl             0          0 2.97            12
## 4      0         1    Lark             5          1 3.76             0
## 5      0         4     Owl             0          0 3.20             4
## 6      1         4 Neither             0          0 3.50             0
##   CognitionZscore PoorSleepQuality DepressionScore AnxietyScore StressScore
## 1           -0.26                4               4            3           8
## 2            1.39                6               1            0           3
## 3            0.38               18              18           18           9
## 4            1.39                9               1            4           6
## 5            1.22                9               7           25          14
## 6           -0.04                6              14            8          28
##   DepressionStatus AnxietyStatus Stress DASScore Happiness AlcoholUse Drinks
## 1           normal        normal normal       15        28   Moderate     10
## 2           normal        normal normal        4        25   Moderate      6
## 3         moderate        severe normal       45        17      Light      3
## 4           normal        normal normal       11        32      Light      2
## 5           normal        severe normal       46        15   Moderate      4
## 6         moderate      moderate   high       50        22    Abstain      0
##   WeekdayBed WeekdayRise WeekdaySleep WeekendBed WeekendRise WeekendSleep
## 1      25.75        8.70         7.70      25.75        9.50         5.88
## 2      25.70        8.20         6.80      26.00       10.00         7.25
## 3      27.44        6.55         3.00      28.00       12.59        10.09
## 4      23.50        7.17         6.77      27.00        8.00         7.25
## 5      25.90        8.67         6.09      23.75        9.50         7.00
## 6      23.80        8.95         9.05      26.00       10.75         9.00
##   AverageSleep AllNighter
## 1         7.18          0
## 2         6.93          0
## 3         5.02          0
## 4         6.90          0
## 5         6.35          0
## 6         9.04          0

Q1: Is there a significant difference in the average GPA between male and female college students?

## 
##  Welch Two Sample t-test
## 
## data:  GPA by Gender
## t = 3.9139, df = 200.9, p-value = 0.0001243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  0.09982254 0.30252780
## sample estimates:
## mean in group 0 mean in group 1 
##        3.324901        3.123725

  • t is 3.91, indicating a large difference

  • df is 201, indicating a large sample size

  • p-value is 0.000124, indicating a very low chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between 0.0998 and 0.303

  • Male average GPA is 3.32; Female average GPA is 3.12

There is a significant difference, as shown by the low p-value, high df, and high t. The box plot above illustrates a healthy gap in median GPA and interquartile ranges.

Q2: Is there a significant difference in the average number of early classes between the first two class years and other class years?

## 
##  Welch Two Sample t-test
## 
## data:  NumEarlyClass by ClassGroup
## t = 4.1813, df = 250.69, p-value = 0.00004009
## alternative hypothesis: true difference in means between group FirstTwoYears and group OtherYears is not equal to 0
## 95 percent confidence interval:
##  0.4042016 1.1240309
## sample estimates:
## mean in group FirstTwoYears    mean in group OtherYears 
##                    2.070423                    1.306306
  • t is 4.18, indicating a large difference

  • df is 251, indicating a large sample size

  • p-value is 0.0000401, indicating a very low chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between 0.404 and 1.12

  • Underclassmen average early classes is 2.07; Upperclassmen average early classes is 1.31

There is a significant difference, as shown by the low p-value, high df, and high t.

Q3: Do students who identify as “larks” have significantly better cognitive skills (cognition z-score) compared to “owls”?

## 
##  Welch Two Sample t-test
## 
## data:  CognitionZscore by LarkOwl
## t = 0.80571, df = 75.331, p-value = 0.4229
## alternative hypothesis: true difference in means between group Lark and group Owl is not equal to 0
## 95 percent confidence interval:
##  -0.1893561  0.4465786
## sample estimates:
## mean in group Lark  mean in group Owl 
##         0.09024390        -0.03836735

  • t is 0.806, indicating a small difference

  • df is 75.3, indicating a relatively large sample size

  • p-value is 0.423, indicating a 42.3% chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -0.189 and 0.447

  • ‘Lark’ average cognition score is 0.0902; ‘Owl’ average cognition score is -0.0384

Larks do not have significantly better cognitive skills compared to owls, as shown by the low t, high p-value, and high df. In addition, the confidence interval includes zero, so we cannot rule out null hypothesis. The box plot above shows a very slight difference in median and interquartile ranges of cognition scores between larks and owls.

Q4: Is there a significant difference in the average number of classes missed in a semester between students who had at least one early class (EarlyClass=1) and those who didn’t (EarlyClass=0)?

## 
##  Welch Two Sample t-test
## 
## data:  ClassesMissed by EarlyClass
## t = 1.4755, df = 152.78, p-value = 0.1421
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2233558  1.5412830
## sample estimates:
## mean in group 0 mean in group 1 
##        2.647059        1.988095
  • t is 1.48, indicating a moderate difference

  • df is 153, indicating a large sample size

  • p-value is 0.142, indicating a 14.2% of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -0.223 and 1.54

  • Non-early class students average missed classes is 2.65; Early class students average missed classes is 1.99

There is no significant difference, as shown by the high p-value, high df, and relatively low t.

Q5: Is there a significant difference in the average happiness level between students with at least moderate depression and normal depression status?

## 
##  Welch Two Sample t-test
## 
## data:  Happiness by DepressionGroup
## t = -5.6339, df = 55.594, p-value = 0.0000006057
## alternative hypothesis: true difference in means between group ModerateOrSevere and group Normal is not equal to 0
## 95 percent confidence interval:
##  -7.379724 -3.507836
## sample estimates:
## mean in group ModerateOrSevere           mean in group Normal 
##                       21.61364                       27.05742
  • t is -5.63, indicating a large difference

  • df is 55.6, indicating a moderate sample size

  • p-value is 0.000000606, indicating a very low chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -7.38 and -3.51

  • Average happiness of depressed students is 21.6; Average happiness of normal students is 27.1

There is a significant difference, as shown by the low p-value and high t magnitude.

Q6: Is there a significant difference in average sleep quality scores between students who reported having at least one all-nighter (AllNighter=1) and those who didn’t (AllNighter=0)?

## 
##  Welch Two Sample t-test
## 
## data:  PoorSleepQuality by AllNighter
## t = -1.7068, df = 44.708, p-value = 0.09479
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.9456958  0.1608449
## sample estimates:
## mean in group 0 mean in group 1 
##        6.136986        7.029412
  • t is -1.71, indicating a moderate difference

  • df is 44.7, indicating a moderate sample size

  • p-value is 0.0948, indicating a 9.48% of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -1.95 and 0.161

  • Average sleep of non-all-nighter is 6.14; Average sleep of all-nighter is 7.03

There is no significant difference, as shown by the relatively high p-value. Furthermore, the 95% confidence interval includes zero.

Q7: Do students who abstain from alcohol use have significantly better stress scores than those who report heavy alcohol use?

## 
##  Welch Two Sample t-test
## 
## data:  StressScore by AlcoholUse
## t = -0.62604, df = 28.733, p-value = 0.5362
## alternative hypothesis: true difference in means between group Abstain and group Heavy is not equal to 0
## 95 percent confidence interval:
##  -6.261170  3.327346
## sample estimates:
## mean in group Abstain   mean in group Heavy 
##              8.970588             10.437500
  • t is -0.626, indicating a small difference

  • df is 28.7, indicating a relatively small sample size

  • p-value is 0.536, indicating a 53.6% chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -6.26 and 3.33

  • Average stress of abstaining students is 6.14; Average stress of heavy users is 7.03

There is no significant difference, as shown by the high p-value and low t. Furthermore, the 95% confidence interval includes zero.

Q8: Is there a significant difference in the average number of drinks per week between students of different genders?

## 
##  Welch Two Sample t-test
## 
## data:  Drinks by Gender
## t = -6.1601, df = 142.75, p-value = 0.000000007002
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.360009 -2.241601
## sample estimates:
## mean in group 0 mean in group 1 
##        4.238411        7.539216
  • t is -6.16, indicating a large difference

  • df is 143, indicating a large sample size

  • p-value is 0.00000000700, indicating a very low chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -4.36 and -2.24

  • Average drinks for female students is 4.24; Average drinks for male students is 7.54

There is significant difference, as shown by the low p-value and high t magnitude, and high df.

Q9: Is there a significant difference in the average GPA between students who report getting less than 7 hours of sleep on weekdays and those who report getting 7 or more hours of sleep?

## 
##  Welch Two Sample t-test
## 
## data:  GPA by SleepCategory
## t = 0.28997, df = 59.95, p-value = 0.7728
## alternative hypothesis: true difference in means between group 7 or more hours and group Less than 7 hours is not equal to 0
## 95 percent confidence interval:
##  -0.1292119  0.1730252
## sample estimates:
##   mean in group 7 or more hours mean in group Less than 7 hours 
##                        3.247864                        3.225957
  • t is 0.290, indicating a small difference

  • df is 60.0, indicating a moderate sample size

  • p-value is 0.773, indicating a 77.3% chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -0.129 and 0.173

  • Average GPA for greater than 7 hours of sleep students is 3.25; Average GPA for less than 7 hours of sleep students is 3.23

There is no significant difference, as shown by the high p-value and low t. Moreover, the 95% confidence interval includes zero.

Q10: Is there a significant difference in the average hours of sleep on weekends between first two year students and other students?

## 
##  Welch Two Sample t-test
## 
## data:  WeekendSleep by YearGroup
## t = -0.047888, df = 237.36, p-value = 0.9618
## alternative hypothesis: true difference in means between group First Two Years and group Other Years is not equal to 0
## 95 percent confidence interval:
##  -0.3497614  0.3331607
## sample estimates:
## mean in group First Two Years     mean in group Other Years 
##                      8.213592                      8.221892
  • t is -0.0479, indicating a small difference

  • df is 237, indicating a large sample size

  • p-value is 0.962, indicating a 96.2% chance of observing the results, assuming the null hypothesis is true

  • There is a 95% chance that the true difference lies between -0.350 and 0.333

  • Average weekend sleep for underclassmen is 8.21; Average weekend sleep for upperclassmen is 8.22

There is no significant difference, as shown by the extremely high p-value, low t, and high df. Moreover, the 95% confidence interval includes zero.

4. Summary

The goal of this report is to assess the significance of differences across 10 research questions involving college student data. I determined that questions #1, #2, #5, and #8 displayed significant differences between respective populations. Questions #3, #4, #6, #7, #9, and #10 did not display a significant difference.

This analysis is important because it reveals which actions and behaviors will lead to student success and showing other actions and behaviors that may not matter. Also, it may inspire additional statistical questions.

For instance, question #1 suggests that men have higher college GPA than women, and statisticians may decide to explore that data deeper and uncover why men have greater academic success. Question #2 suggests that underclassmen take more early classes than upperclassmen, and if someone were interested, they could survey students and counselors on why that is the case. Question #5 suggests that depressive students are less happy than normal students, which seems obvious, but it does confirm that the happiness level test seems to be accurate. Question #8 suggests that male college students drink more than female while having higher average GPA, so a reasonable follow up would be to research if drinking is related to GPA.

5. References