A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school.
The response variable, admit/don’t admit, is a binary variable.
Admis <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/ggplot2/SchoolAdmissions.csv")
# Same Data Set - with added statistical analysis
Admis2 <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/ggplot2/SchoolAdmissions2.csv")
This dataset has a binary response (outcome, dependent) variable called admit. There are three predictor variables: gre, gpa and rank. We will treat the variables gre and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. We can get basic descriptives for the entire data set by using summary.
summary(Admis) %>% kable()
admit | gre | gpa | rank | rankP | |
---|---|---|---|---|---|
Min. :0.0000 | Min. :220.0 | Min. :2.260 | Min. :1.000 | Min. :0.05879 | |
1st Qu.:0.0000 | 1st Qu.:520.0 | 1st Qu.:3.130 | 1st Qu.:2.000 | 1st Qu.:0.19543 | |
Median :0.0000 | Median :580.0 | Median :3.395 | Median :2.000 | Median :0.29799 | |
Mean :0.3175 | Mean :587.7 | Mean :3.390 | Mean :2.485 | Mean :0.31750 | |
3rd Qu.:1.0000 | 3rd Qu.:660.0 | 3rd Qu.:3.670 | 3rd Qu.:3.000 | 3rd Qu.:0.40139 | |
Max. :1.0000 | Max. :800.0 | Max. :4.000 | Max. :4.000 | Max. :0.73841 |
head(Admis2) %>% kable()
admit | gre | gpa | rank | rankP | fit | se.fit | residual.scale | UL | LL | PredictedProb |
---|---|---|---|---|---|---|---|---|---|---|
0 | 380 | 3.61 | 3 | 0.1726265 | -1.5671256 | 0.3359675 | 1 | 0.2872804 | 0.0974731 | 0.1726265 |
1 | 660 | 3.67 | 3 | 0.2921750 | -0.8848442 | 0.2297198 | 1 | 0.3930300 | 0.2083178 | 0.2921750 |
1 | 800 | 4.00 | 1 | 0.7384082 | 1.0377118 | 0.3480671 | 1 | 0.8481189 | 0.5879508 | 0.7384082 |
1 | 640 | 3.19 | 4 | 0.1783846 | -1.5273305 | 0.3373684 | 1 | 0.2960689 | 0.1007814 | 0.1783846 |
0 | 520 | 2.93 | 4 | 0.1183539 | -2.0081113 | 0.3552036 | 1 | 0.2121670 | 0.0627195 | 0.1183539 |
1 | 760 | 3.00 | 2 | 0.3699699 | -0.5323458 | 0.3023582 | 1 | 0.5150645 | 0.2450910 | 0.3699699 |
ggplot(Admis2, aes(x = gre, y = PredictedProb)) + geom_point(aes(colour = factor(rank)))
ggplot(Admis2, aes(x = gre, y = PredictedProb)) +
geom_point(aes(colour = factor(rank))) +
geom_smooth(aes(fill = rank), method="lm", size = 1)
## `geom_smooth()` using formula 'y ~ x'
ggplot(Admis2, aes(x = gre, y = PredictedProb)) +
geom_point(aes(colour = factor(rank))) +
geom_smooth(aes(colour = factor(rank)), method="lm", se = FALSE, size = 1)
## `geom_smooth()` using formula 'y ~ x'