In this analysis I will study the earnings of elite schools versus non-elite schools in the US. “Elite” defined here will be a school with a less than 35% admissions rate and an SAT average above 1300 (a subjective definition but one that I think is good enough for this analysis). Only 58 schools reach this standard. The majority of these schools are private non-profit institutions.
There are caveats with the data however. This data only includes students who took out federal financial aid. At the most elite institutions very few take out federal financial aid. Also, the earnings data tracks students who attended a particular university in 2001 (on federal aid) and matches up with their tax returns in 2011. This is important to note because it is uncertain whether this analysis could be practically used to make a sound financial decision today as currents trends in the labor market certainly affect university outcomes. Lastly, there is no indication of degree attainment in the data. I presume this data includes people who took out loans who never completed their degree and people who went on to graduate school and earned an advanced degree.
db <- dbConnect(dbDriver("SQLite"), "C:/Users/Christopher Pearson/Desktop/college-scorecard-release-2015-09-23-15-08-57/output/database.sqlite")
dbGetQuery(db, "PRAGMA temp_store=2;")
sample <- dbGetQuery(db,"
SELECT sc13.INSTNM Name,
sc13.CONTROL CollegeType,
sc13.ADM_RATE AdRate,
sc13.SATVR75 SeventyFiveR,
sc13.SATMT75 SeventyFiveM,
sc13.SAT_AVG SATAvg,
sc13.PCTFLOAN PctFinAid,
sc11.md_earn_wne_p10 MedEarning10yr,
sc11.pct25_earn_wne_p10 twentyfivePctEarn10yr,
sc11.pct10_earn_wne_p10 p10,
sc11.pct75_earn_wne_p10 p75,
sc11.pct90_earn_wne_p10 p90
FROM Scorecard sc13
INNER JOIN Scorecard sc11 ON sc13.UNITID = sc11.UNITID
WHERE sc13.Year = 2013
AND sc11.Year = 2011
AND sc13.ADM_RATE < 0.35
AND sc13.SAT_AVG > 1300
AND sc13.CUML_DEBT_P90 != 'PrivacySuppressed'
AND sc13.PREDDEG = 'Predominantly bachelor''s-degree granting'
")
all_schools <- dbGetQuery(db,"
SELECT sc13.INSTNM Name,
sc11.md_earn_wne_p10 MedEarning10yr,
sc11.pct25_earn_wne_p10 twentyfivePctEarn10yr,
sc11.pct10_earn_wne_p10 a10,
sc11.pct75_earn_wne_p10 a75,
sc11.pct90_earn_wne_p10 a90,
sc11.SAT_AVG SATAvg,
sc11.ADM_RATE AdRate
FROM Scorecard sc13
INNER JOIN Scorecard sc11 ON sc13.UNITID = sc11.UNITID
WHERE sc13.Year = 2013
AND sc11.Year = 2011
AND sc13.CUML_DEBT_P90 != 'PrivacySuppressed'
AND sc13.PREDDEG = 'Predominantly bachelor''s-degree granting'
AND sc11.ADM_RATE != 'NA'
AND sc11.md_earn_wne_p10 != 'NA'
")
Loading the data using a SQL query. I will save elite schools in the sample variable and data for all schools in the all_schools variable. An inner join was used because the latest scorecard doesn’t have earnings data but 2011 does. Also, I made sure that only 4-yr universities were counted (no 2yr schools, medical schools, etc.,)
all10 <- median(all_schools[['a10']], na.rm = TRUE)
all25 <- median(all_schools[['twentyfivePctEarn10yr']], na.rm = TRUE)
all50 <- median(all_schools[['MedEarning10yr']], na.rm = TRUE)
all75 <- median(all_schools[['a75']], na.rm = TRUE)
all90 <- median(all_schools[['a90']], na.rm = TRUE)
allschoolsr <- c('All Schools', all10, all25, all50, all75, all90)
sample_all <- sample[,c(1,10,9,8,11,12)]
sample_all <- rbind(sample_all, allschoolsr)
sample_all[, c(2:6)] <- sapply(sample_all[, c(2:6)], as.numeric)
sample_all$SchoolName <- reorder(sample_all$Name, sample_all$MedEarning10yr)
sample_all_bottom <- subset(sample_all, MedEarning10yr <= median(sample_all$MedEarning10yr))
sample_all_bottom <- sample_all_bottom[order(sample_all_bottom$MedEarning10yr),]
rownames(sample_all_bottom) <- NULL
sample_all_bottom[,c('Name', 'MedEarning10yr')]
## Name MedEarning10yr
## 1 Oberlin College 38400
## 2 All Schools 40700
## 3 Colorado College 41100
## 4 Macalester College 45700
## 5 Carleton College 46100
## 6 Vassar College 47100
## 7 Swarthmore College 49400
## 8 University of North Carolina at Chapel Hill 50400
## 9 Wesleyan University 50900
## 10 Tulane University of Louisiana 52400
## 11 Pomona College 52600
## 12 Colby College 53000
## 13 Middlebury College 53400
## 14 Cooper Union for the Advancement of Science and Art 53700
## 15 Bowdoin College 54800
## 16 Haverford College 55600
## 17 College of William and Mary 56400
## 18 Wellesley College 56700
## 19 Amherst College 56800
## 20 Hamilton College 57300
## 21 Barnard College 57400
## 22 University of Michigan-Ann Arbor 57900
## 23 Williams College 58100
## 24 Davidson College 58500
## 25 University of Virginia-Main Campus 58600
## 26 New York University 58800
## 27 Emory University 59000
## 28 Brown University 59700
## 29 Rice University 59900
## 30 Northeastern University 60100
Compared to all schools only Oberlin College has a lower median than all schools ten years after initial attendance. The majority of elite schools with lower median earnings are liberal arts colleges. A potential explanation for this is that these schools traditionally are not solely meant to prepare one for employers.
> eliteschool_plot <- ggplot(sample, aes(SATAvg, MedEarning10yr)) + geom_point(aes(colour = Name)) + geom_smooth() + xlab("Elite School SAT Average") + ylab("Median Yearly Earnings 10 Years After Attendance") + theme(legend.position = "none")
>
> ggplotly(width = 1000, height = 600)
Scatter plot of elite schools with SAT average on the x axis and median earnings ten years after initial attendance on the y. There is a somewhat positive linear relationship however there is a lot of noise. Also, around an SAT average of 1300 there tends to be higher earnings than around 1350.
eliteschool_adrate <- ggplot(sample, aes(AdRate, MedEarning10yr)) + geom_point(aes(colour = Name)) + geom_smooth() + xlab("Elite School Admissions Rate") + ylab("Median Yearly Earnings 10 Years After Attendance") + theme(legend.position = "none")
ggplotly(width = 1000, height = 600)
Scatter plot of elite schools with admissions rate on the x and median earnings ten years after initial attendance on the y. There tends to be a negative relationship between the two. There is however much variation.
corrgram(sample, order = TRUE, upper.panel = panel.pie)
Correlation matrix for all schools. There is a good negative relationship between admissions rate and median earnings 10 years out. Also, median earnings ten years out tend to be positively related with SAT average and SAT math at the 75th percentile. Interestingly, SAT reading at the 75th percentile is not as strongly correlated as SAT math 75th percentile for earnings.
all_schools <- subset(all_schools, MedEarning10yr != 0)
ggplot(all_schools, aes(SATAvg, MedEarning10yr)) + geom_point() + geom_smooth() + xlab("All Schools SAT Average") + ylab("Median Yearly Earnings 10 Years After Attendance")
Scatter plot of all schools with the SAT average of the school on the and median earnings ten years after initial attendance on the y. There appears to be a pretty good positive linear relationship here. Surprisingly, the highest earning schools are near an SAT of 1100.
ggplot(all_schools, aes(AdRate, MedEarning10yr)) + geom_point() + geom_smooth() + xlab("All Schools Admissions Rate") + ylab("Median Yearly Earnings 10 Years After Attendance")
Scatter plot of all schools with admissions rate on the x and median earnings ten years out on the y. From ~5% and 25% there is a negative relationship but around ~35% and beyond earnings tend to flatten out.
> schoolbox <- ggplot(aes(SchoolName),data=sample_all) + geom_boxplot(
+ aes(ymin=p10, lower=twentyfivePctEarn10yr, middle =MedEarning10yr, upper=p75, ymax=p90),
+ stat="identity", color = 'Blue') + theme(axis.text.y = element_text(size=7.25))
>
> schoolbox + coord_flip() + ggtitle("Median Yearly Earnings 10 Years After Attendance") + xlab("School Name") + ylab("Yearly Earnings $")
Box plot of elite schools by median earnings ten years out. No surprise that Harvard and MIT occupy the top two spots. What is interesting though is that while the median students at elite schools outperform students at all schools the students at the 25th percentile appear to earn no more than the median student at all schools. Also, the 10th percentile at elite schools is similar to the 10th percentile at all schools. What is unique about this graph especially is that only a handful of elite schools have a very high 90th percentile, MIT, Harvard, Stanford, and UPenn all have 90th percentiles above $225000.
sample$Elite <- rep('Elite',nrow(sample))
allschools_nozero <- subset(all_schools, MedEarning10yr != 0 & AdRate >= 0.35 & SATAvg <= 1300)
allschools_nozero$Elite <- rep('Non-Elite', nrow(allschools_nozero))
eliteschools <- data.frame(sample$MedEarning10yr, sample$Elite)
nonelite_schools <- data.frame(allschools_nozero$MedEarning10yr,allschools_nozero$Elite)
options(scipen = 20)
qplot(allschools_nozero$MedEarning10yr, geom = "histogram", binwidth = 1000, xlim = c(15000, 125000), fill = I('blue'),
xlab = 'Median Yearly Earnings 10yrs After Attendance for All Schools', alpha = I(0.5))
qplot(sample$MedEarning10yr, geom = 'histogram', binwidth = 3000, xlim = c(30000, 100000), fill = I('blue'),
xlab = 'Median Yearly Earnings 10yrs After Attendance for Elite Schools', alpha = I(0.5))
names(eliteschools) <- c("MedEarning10yr", "Elite")
names(nonelite_schools) <- c("MedEarning10yr", "Elite")
nonelite_and_elite <- rbind(eliteschools, nonelite_schools)
ggplot(nonelite_and_elite, aes(MedEarning10yr, colour = Elite, fill = Elite)) + geom_density(alpha = 0.1) + xlab("Median Yearly Earnings 10 yrs After Attendance")
Probability densities for elite and non-elite schools. As shown, the average elite school has a much higher potential for earnings than non-elite schools. However, there is much overlap for the lower quarter of elite schools and the top quarter for non-elite schools. The upper bound for elite schools is very high yet there a small number of non-elite schools that are close to that upper bound (the long tail on the right for the non-elite distribution). It clear that non-elite schools have a much lower lower bound, less than $25000.
allschools_nozero <- allschools_nozero[order(-allschools_nozero$MedEarning10yr),]
rownames(allschools_nozero) <- NULL
allschools_nozero[1:50, c(1,2)]
## Name MedEarning10yr
## 1 MCPHS University 116400
## 2 Albany College of Pharmacy and Health Sciences 110600
## 3 University of the Sciences 85800
## 4 Stevens Institute of Technology 82800
## 5 California Maritime Academy 81100
## 6 Massachusetts Maritime Academy 79500
## 7 SUNY Maritime College 77300
## 8 Bentley University 74900
## 9 Kettering University 74900
## 10 Colorado School of Mines 74700
## 11 Maine Maritime Academy 74700
## 12 University of Colorado Denver 73800
## 13 Polytechnic Institute of New York University 73500
## 14 Lafayette College 69800
## 15 Clarkson University 68400
## 16 Illinois Institute of Technology 68200
## 17 Santa Clara University 67700
## 18 University of the Pacific 66400
## 19 Milwaukee School of Engineering 65700
## 20 Missouri University of Science and Technology 65500
## 21 New Jersey Institute of Technology 65300
## 22 Loyola University Maryland 62100
## 23 Bryant University 61900
## 24 Drexel University 61100
## 25 University of San Francisco 61000
## 26 Embry-Riddle Aeronautical University-Prescott 60900
## 27 Embry-Riddle Aeronautical University-Daytona Beach 60900
## 28 Boston University 60600
## 29 Michigan Technological University 60100
## 30 University of California-San Diego 59600
## 31 Providence College 59600
## 32 University of Maryland-College Park 59100
## 33 Saint Mary's College of California 59000
## 34 Capitol College 58900
## 35 Stonehill College 58800
## 36 Saint Joseph's University 58500
## 37 Pace University-New York 58400
## 38 SUNY at Binghamton 58400
## 39 Virginia Polytechnic Institute and State University 57900
## 40 Creighton University 57800
## 41 Mount Carmel College of Nursing 57700
## 42 Molloy College 57500
## 43 Brigham Young University-Provo 57400
## 44 University of California-Davis 57100
## 45 George Mason University 57000
## 46 Rochester Institute of Technology 56900
## 47 The College of New Jersey 56800
## 48 University of Illinois at Urbana-Champaign 56600
## 49 Syracuse University 56600
## 50 Virginia Military Institute 56600
As shown, many of the highest earning non-elite schools are very technical/vocational in nature. Most of these schools are pharmacy, maritime, engineering, and business schools which would indicate major field of study as a big factor in earnings.
allschools_nozero$SchoolName <- reorder(allschools_nozero$Name, allschools_nozero$MedEarning10yr)
allschool_box <- ggplot(aes(SchoolName),data=allschools_nozero[1:50,]) + geom_boxplot(
aes(ymin=a10, lower=twentyfivePctEarn10yr, middle =MedEarning10yr, upper=a75, ymax=a90),
stat="identity", color = 'Blue') + theme(axis.text.y = element_text(size=8))
allschool_box + coord_flip() + ggtitle("Non-Elite Schools 10yrs After Attendance") + xlab("School Name") + ylab("Yearly Earnings $")
A box plot of the highest earning non-elite schools so that the variation form the median can be seen.