University Admissions Example

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school.

The response variable, admit/don’t admit, is a binary variable.

Admis <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/ggplot2/SchoolAdmissions.csv")

# Same Data Set - with added statistical analysis

Admis2 <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/ggplot2/SchoolAdmissions2.csv")

This dataset has a binary response (outcome, dependent) variable called admit. There are three predictor variables: gre, gpa and rank. We will treat the variables gre and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. We can get basic descriptives for the entire data set by using summary.

Summary Statistics

summary(Admis) %>% kable()
admit gre gpa rank rankP
Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000 Min. :0.05879
1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000 1st Qu.:0.19543
Median :0.0000 Median :580.0 Median :3.395 Median :2.000 Median :0.29799
Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485 Mean :0.31750
3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000 3rd Qu.:0.40139
Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000 Max. :0.73841

Data with additional statistical information

head(Admis2) %>% kable()
admit gre gpa rank rankP fit se.fit residual.scale UL LL PredictedProb
0 380 3.61 3 0.1726265 -1.5671256 0.3359675 1 0.2872804 0.0974731 0.1726265
1 660 3.67 3 0.2921750 -0.8848442 0.2297198 1 0.3930300 0.2083178 0.2921750
1 800 4.00 1 0.7384082 1.0377118 0.3480671 1 0.8481189 0.5879508 0.7384082
1 640 3.19 4 0.1783846 -1.5273305 0.3373684 1 0.2960689 0.1007814 0.1783846
0 520 2.93 4 0.1183539 -2.0081113 0.3552036 1 0.2121670 0.0627195 0.1183539
1 760 3.00 2 0.3699699 -0.5323458 0.3023582 1 0.5150645 0.2450910 0.3699699

Plots

ggplot(Admis2, aes(x = gre, y = PredictedProb))  + geom_point(aes(colour = factor(rank)))

Simple Regression Fit

ggplot(Admis2, aes(x = gre, y = PredictedProb)) + 
  geom_point(aes(colour = factor(rank))) + 
  geom_smooth(aes(fill = rank), method="lm",  size = 1)
## `geom_smooth()` using formula 'y ~ x'

Disaggregate Regression Fits

ggplot(Admis2, aes(x = gre, y = PredictedProb)) + 
  geom_point(aes(colour = factor(rank))) + 
  geom_smooth(aes(colour = factor(rank)), method="lm", se = FALSE, size = 1)
## `geom_smooth()` using formula 'y ~ x'