A data set from a survey of statistics classes at UC Davis collected in 2000. The study was done to find correlations between GPA, students’ personal behavior, physical attributes and parental attributes. This report will analyze each variable of the study and make correlations between several variables pertaining to student behavior and academic performance.
The MindOnStats R package
Dataset published by: Jessica Huts and Robert Heckard
There are n = 173 total observations in the study with 12 variables covering a variety of personal habits along with the physical attributes height and parental height. Variables are ordinal, integer, or numeric.
For each variable a brief description is provided along with a summary and barplot Missing data occurs sporadically with several NAs in almost every column. It is noted in the summary whether this might have meaning beyond an error or reluctance to submit an answer. All NAs are omitted from barplots but listed in variable summaries.
Sex - Male or Female
A histogram can be used to show that there were more females than males in this study
## Female Male
## 94 79
class - Liberal Arts or Non Liberal Arts
At a glance, we can tell that there is a disproportionate amount of Non Liberal Arts majors to Liberal Arts majors. This is most likely due to the data being polled from statistics classes where there are more likely to be science majors attending classes.
## LibArts NonLib
## 25 148
Seat - Typical classroom seat location (Front Middle Back)
The preferred seating of students seems to have a normal distribution around the middle category. There are also two NAs which may mean that those students have no specific preference.
## Front Middle Back NA's
## 41 93 37 2
Continous and Categorical data
TV - Hours spent watching television each week
The majority of the population less than 15 hours per week with an average of 8.8. There is one extreme outlier at 100 hours per week.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 6.000 8.882 12.000 100.000
computer - Hours spent at a computer each week
Students spend an average of about 6 hours more time on the computer, then watching TV. There is also an extreme outlier at 84 hours.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 6.00 10.00 14.26 20.00 84.00 2
Sleep - Hours of sleep previous night
Student sleep falls under a normal distribution curve with mean and median values both at 7 hours the previous night. There are a surprising number of students that reported less than 5 hours of sleep
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 6.000 7.000 6.935 8.000 12.000
alcohol - Alchoholic beverages consumed per week
From the results of the barplot one can see that almost of the students in the study do not drink. The mean is centered around 4 with several extreme outliers past 40 and 6 NAs. It is hard to tell if the NAs did not wish to disclose how much they drank for the study or if they did not drink alcohol.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 1.000 4.108 4.250 55.000 6
Height - Self-reported height, inches
Height falls along a normal distribution curve centered around 66.7 inches.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 57.00 64.00 66.00 66.77 69.25 77.00 2
momheight - Mother’s height, inches
Mother height falls along a normal distribution curve centered around 63 inches.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 54.00 61.00 63.00 63.35 65.00 80.00 3
dadheight - Father’s height, inches
Father height is slightly taller than mother height with that average at 69 inches.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 55.00 67.00 69.00 69.15 72.00 78.00 6
exercise - Hours spent exercising each week
The majority of students exercise less than 10 hours each week, with the average at 4.5 hours. There are several outliers past twenty. This could account for a small number of collegiate athletes in the course.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 2.000 3.000 4.529 6.000 30.000 1
GPA - Student’s GPA
The mean GPA of students is centered around 3.0 with several significant outliers around and below 1.0. There is a significant amount of NAs in this field which could indicate that some of the population may be freshman
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.548 2.500 3.000 2.915 3.300 4.000 9
Correlation 1
Exercise appears to affect GPA in the population differently between males and females. Females that exercise more see an increase in GPA and males that exericise more see a decline.
qplot(exercise, GPA, data=UCD, geom = 'point', color = Sex) +
geom_smooth(method='lm', se = FALSE)
Correlation 2
Students that sit in the front of the classroom tend to watch less TV.
qplot(TV, GPA, data=UCD, geom = 'point', color = Seat)+
geom_smooth(method='lm', se = FALSE) + xlim(0,50)
Correlation 3
Males that watch more TV drink consume more alcohol.
qplot(TV, alcohol, data= UCD, geom = 'point', color = Sex) +
geom_smooth(method='lm', se = FALSE) + xlim(0,50)
Correlation 4
Average student GPA falls the further back one sits in the class.
ggplot(na.omit(UCD), aes(x=Seat, y=GPA)) +
geom_boxplot() + ylim(2,4)
Correlation 5
Alcohol consumption drastically increases among males that exercise more.
qplot(exercise, alcohol, data= UCD, geom = 'point', color = Sex) +
geom_smooth(method='lm', se = FALSE) + xlim(0,50)
The correlations show that more healthy active student behaviors tend to lead to better student performance. Interestingly there is a positive correlation between an increase in exercise and alchohol consumption, particularly in males.
To expand this study, more student behavior variables should be included as well as an increased population, maybe between 500 and 1000 students.