In this analysis I will study the earnings of elite schools versus non-elite schools in the US. “Elite” defined here will be a school with a less than 35% admissions rate and an SAT average above 1300 (a subjective definition but one that I think is good enough for this analysis). Only 58 schools reach this standard. The majority of these schools are private non-profit institutions.

There are caveats with the data however. This data only includes students who took out federal financial aid. At the most elite institutions very few take out federal financial aid. Also, the earnings data tracks students who attended a particular university in 2001 (on federal aid) and matches up with their tax returns in 2011. This is important to note because it is uncertain whether this analysis could be practically used to make a sound financial decision today as currents trends in the labor market certainly affect university outcomes. Lastly, there is no indication of degree attainment in the data. I presume this data includes people who took out loans who never completed their degree and people who went on to graduate school and earned an advanced degree.

db <- dbConnect(dbDriver("SQLite"), "C:/Users/Christopher Pearson/Desktop/college-scorecard-release-2015-09-23-15-08-57/output/database.sqlite")
 

dbGetQuery(db, "PRAGMA temp_store=2;")

sample <- dbGetQuery(db,"
                     SELECT sc13.INSTNM Name,
                            sc13.CONTROL CollegeType,
                            sc13.ADM_RATE AdRate, 
                            sc13.SATVR75 SeventyFiveR,
                            sc13.SATMT75 SeventyFiveM,
                            sc13.SAT_AVG SATAvg,
                            sc13.PCTFLOAN PctFinAid,
                            sc11.md_earn_wne_p10 MedEarning10yr,
                            sc11.pct25_earn_wne_p10 twentyfivePctEarn10yr,
                            sc11.pct10_earn_wne_p10 p10,
                            sc11.pct75_earn_wne_p10 p75,
                            sc11.pct90_earn_wne_p10 p90
                     FROM Scorecard sc13
                     INNER JOIN Scorecard sc11 ON sc13.UNITID = sc11.UNITID
                     WHERE sc13.Year = 2013
                       AND sc11.Year = 2011 
                       AND sc13.ADM_RATE < 0.35
                       AND sc13.SAT_AVG > 1300
                       AND sc13.CUML_DEBT_P90 != 'PrivacySuppressed'
                       AND sc13.PREDDEG = 'Predominantly bachelor''s-degree granting'
                     
                     ")

all_schools <- dbGetQuery(db,"
                         SELECT sc13.INSTNM Name,
                                sc11.md_earn_wne_p10 MedEarning10yr,
                                sc11.pct25_earn_wne_p10 twentyfivePctEarn10yr,
                                sc11.pct10_earn_wne_p10 a10,
                                sc11.pct75_earn_wne_p10 a75,
                                sc11.pct90_earn_wne_p10 a90,
                                sc11.SAT_AVG SATAvg,
                                sc11.ADM_RATE AdRate
                         FROM Scorecard sc13
                         INNER JOIN Scorecard sc11 ON sc13.UNITID = sc11.UNITID
                         WHERE sc13.Year = 2013
                          AND sc11.Year = 2011
                          AND sc13.CUML_DEBT_P90 != 'PrivacySuppressed'
                          AND sc13.PREDDEG = 'Predominantly bachelor''s-degree granting'
                          AND sc11.ADM_RATE != 'NA'
                          AND sc11.md_earn_wne_p10 != 'NA'
                         ")

Loading the data using a SQL query. I will save elite schools in the sample variable and data for all schools in the all_schools variable. An inner join was used because the latest scorecard doesn’t have earnings data but 2011 does. Also, I made sure that only 4-yr universities were counted (no 2yr schools, medical schools, etc.,)

Lowest Earning Elite Schools

all10 <- median(all_schools[['a10']], na.rm = TRUE)
all25 <- median(all_schools[['twentyfivePctEarn10yr']], na.rm = TRUE)
all50 <- median(all_schools[['MedEarning10yr']], na.rm = TRUE)
all75 <- median(all_schools[['a75']], na.rm = TRUE)
all90 <- median(all_schools[['a90']], na.rm = TRUE)

allschoolsr <- c('All Schools', all10, all25, all50, all75, all90)

sample_all <- sample[,c(1,10,9,8,11,12)]
sample_all <- rbind(sample_all, allschoolsr)
sample_all[, c(2:6)] <- sapply(sample_all[, c(2:6)], as.numeric)
sample_all$SchoolName <- reorder(sample_all$Name, sample_all$MedEarning10yr) 

sample_all_bottom <- subset(sample_all, MedEarning10yr <= median(sample_all$MedEarning10yr))

sample_all_bottom <- sample_all_bottom[order(sample_all_bottom$MedEarning10yr),]

rownames(sample_all_bottom) <- NULL

sample_all_bottom[,c('Name', 'MedEarning10yr')]
##                                                   Name MedEarning10yr
## 1                                      Oberlin College          38400
## 2                                          All Schools          40700
## 3                                     Colorado College          41100
## 4                                   Macalester College          45700
## 5                                     Carleton College          46100
## 6                                       Vassar College          47100
## 7                                   Swarthmore College          49400
## 8          University of North Carolina at Chapel Hill          50400
## 9                                  Wesleyan University          50900
## 10                      Tulane University of Louisiana          52400
## 11                                      Pomona College          52600
## 12                                       Colby College          53000
## 13                                  Middlebury College          53400
## 14 Cooper Union for the Advancement of Science and Art          53700
## 15                                     Bowdoin College          54800
## 16                                   Haverford College          55600
## 17                         College of William and Mary          56400
## 18                                   Wellesley College          56700
## 19                                     Amherst College          56800
## 20                                    Hamilton College          57300
## 21                                     Barnard College          57400
## 22                    University of Michigan-Ann Arbor          57900
## 23                                    Williams College          58100
## 24                                    Davidson College          58500
## 25                  University of Virginia-Main Campus          58600
## 26                                 New York University          58800
## 27                                    Emory University          59000
## 28                                    Brown University          59700
## 29                                     Rice University          59900
## 30                             Northeastern University          60100

Compared to all schools only Oberlin College has a lower median than all schools ten years after initial attendance. The majority of elite schools with lower median earnings are liberal arts colleges. A potential explanation for this is that these schools traditionally are not solely meant to prepare one for employers.

What Predicts Earnings?

> eliteschool_plot <- ggplot(sample, aes(SATAvg, MedEarning10yr)) + geom_point(aes(colour = Name)) + geom_smooth() + xlab("Elite School SAT Average") + ylab("Median Yearly Earnings 10 Years After Attendance") + theme(legend.position = "none")
> 
> ggplotly(width = 1000, height = 600)

Scatter plot of elite schools with SAT average on the x axis and median earnings ten years after initial attendance on the y. There is a somewhat positive linear relationship however there is a lot of noise. Also, around an SAT average of 1300 there tends to be higher earnings than around 1350.

eliteschool_adrate <- ggplot(sample, aes(AdRate, MedEarning10yr)) + geom_point(aes(colour = Name)) + geom_smooth() + xlab("Elite School Admissions Rate") + ylab("Median Yearly Earnings 10 Years After Attendance") + theme(legend.position = "none")

ggplotly(width = 1000, height = 600)

Scatter plot of elite schools with admissions rate on the x and median earnings ten years after initial attendance on the y. There tends to be a negative relationship between the two. There is however much variation.

corrgram(sample, order = TRUE, upper.panel = panel.pie) 

Correlation matrix for all schools. There is a good negative relationship between admissions rate and median earnings 10 years out. Also, median earnings ten years out tend to be positively related with SAT average and SAT math at the 75th percentile. Interestingly, SAT reading at the 75th percentile is not as strongly correlated as SAT math 75th percentile for earnings.

all_schools <- subset(all_schools, MedEarning10yr != 0)

ggplot(all_schools, aes(SATAvg, MedEarning10yr)) + geom_point() + geom_smooth() + xlab("All Schools SAT Average") + ylab("Median Yearly Earnings 10 Years After Attendance")

Scatter plot of all schools with the SAT average of the school on the and median earnings ten years after initial attendance on the y. There appears to be a pretty good positive linear relationship here. Surprisingly, the highest earning schools are near an SAT of 1100.

ggplot(all_schools, aes(AdRate, MedEarning10yr)) + geom_point() + geom_smooth() + xlab("All Schools Admissions Rate") + ylab("Median Yearly Earnings 10 Years After Attendance")

Scatter plot of all schools with admissions rate on the x and median earnings ten years out on the y. From ~5% and 25% there is a negative relationship but around ~35% and beyond earnings tend to flatten out.

Earnings by Type of School

> schoolbox <- ggplot(aes(SchoolName),data=sample_all) + geom_boxplot(
+   aes(ymin=p10, lower=twentyfivePctEarn10yr, middle =MedEarning10yr, upper=p75, ymax=p90),
+   stat="identity", color = 'Blue') + theme(axis.text.y = element_text(size=7.25))
> 
> schoolbox + coord_flip() + ggtitle("Median Yearly Earnings 10 Years After Attendance") + xlab("School Name") + ylab("Yearly Earnings $")

Box plot of elite schools by median earnings ten years out. No surprise that Harvard and MIT occupy the top two spots. What is interesting though is that while the median students at elite schools outperform students at all schools the students at the 25th percentile appear to earn no more than the median student at all schools. Also, the 10th percentile at elite schools is similar to the 10th percentile at all schools. What is unique about this graph especially is that only a handful of elite schools have a very high 90th percentile, MIT, Harvard, Stanford, and UPenn all have 90th percentiles above $225000.

sample$Elite <- rep('Elite',nrow(sample))

allschools_nozero <- subset(all_schools, MedEarning10yr != 0 & AdRate >= 0.35 & SATAvg <= 1300)

allschools_nozero$Elite <- rep('Non-Elite', nrow(allschools_nozero))

eliteschools <- data.frame(sample$MedEarning10yr, sample$Elite)
nonelite_schools <- data.frame(allschools_nozero$MedEarning10yr,allschools_nozero$Elite)

options(scipen = 20)

qplot(allschools_nozero$MedEarning10yr, geom = "histogram", binwidth = 1000, xlim = c(15000, 125000), fill = I('blue'),
      xlab = 'Median Yearly Earnings 10yrs After Attendance for All Schools', alpha = I(0.5))

qplot(sample$MedEarning10yr, geom = 'histogram', binwidth = 3000, xlim = c(30000, 100000), fill = I('blue'),
      xlab = 'Median Yearly Earnings 10yrs After Attendance for Elite Schools', alpha = I(0.5))

names(eliteschools) <- c("MedEarning10yr", "Elite")
names(nonelite_schools) <- c("MedEarning10yr", "Elite")

nonelite_and_elite <- rbind(eliteschools, nonelite_schools)

ggplot(nonelite_and_elite, aes(MedEarning10yr, colour = Elite, fill = Elite)) + geom_density(alpha = 0.1) + xlab("Median Yearly Earnings 10 yrs After Attendance")

Probability densities for elite and non-elite schools. As shown, the average elite school has a much higher potential for earnings than non-elite schools. However, there is much overlap for the lower quarter of elite schools and the top quarter for non-elite schools. The upper bound for elite schools is very high yet there a small number of non-elite schools that are close to that upper bound (the long tail on the right for the non-elite distribution). It clear that non-elite schools have a much lower lower bound, less than $25000.

Top 50 Highest Earning Non-Elite Schools

allschools_nozero <- allschools_nozero[order(-allschools_nozero$MedEarning10yr),]
rownames(allschools_nozero) <- NULL
allschools_nozero[1:50, c(1,2)]
##                                                   Name MedEarning10yr
## 1                                     MCPHS University         116400
## 2       Albany College of Pharmacy and Health Sciences         110600
## 3                           University of the Sciences          85800
## 4                      Stevens Institute of Technology          82800
## 5                          California Maritime Academy          81100
## 6                       Massachusetts Maritime Academy          79500
## 7                                SUNY Maritime College          77300
## 8                                   Bentley University          74900
## 9                                 Kettering University          74900
## 10                            Colorado School of Mines          74700
## 11                              Maine Maritime Academy          74700
## 12                       University of Colorado Denver          73800
## 13        Polytechnic Institute of New York University          73500
## 14                                   Lafayette College          69800
## 15                                 Clarkson University          68400
## 16                    Illinois Institute of Technology          68200
## 17                              Santa Clara University          67700
## 18                           University of the Pacific          66400
## 19                     Milwaukee School of Engineering          65700
## 20       Missouri University of Science and Technology          65500
## 21                  New Jersey Institute of Technology          65300
## 22                          Loyola University Maryland          62100
## 23                                   Bryant University          61900
## 24                                   Drexel University          61100
## 25                         University of San Francisco          61000
## 26       Embry-Riddle Aeronautical University-Prescott          60900
## 27  Embry-Riddle Aeronautical University-Daytona Beach          60900
## 28                                   Boston University          60600
## 29                   Michigan Technological University          60100
## 30                  University of California-San Diego          59600
## 31                                  Providence College          59600
## 32                 University of Maryland-College Park          59100
## 33                  Saint Mary's College of California          59000
## 34                                     Capitol College          58900
## 35                                   Stonehill College          58800
## 36                           Saint Joseph's University          58500
## 37                            Pace University-New York          58400
## 38                                  SUNY at Binghamton          58400
## 39 Virginia Polytechnic Institute and State University          57900
## 40                                Creighton University          57800
## 41                     Mount Carmel College of Nursing          57700
## 42                                      Molloy College          57500
## 43                      Brigham Young University-Provo          57400
## 44                      University of California-Davis          57100
## 45                             George Mason University          57000
## 46                   Rochester Institute of Technology          56900
## 47                           The College of New Jersey          56800
## 48          University of Illinois at Urbana-Champaign          56600
## 49                                 Syracuse University          56600
## 50                         Virginia Military Institute          56600

As shown, many of the highest earning non-elite schools are very technical/vocational in nature. Most of these schools are pharmacy, maritime, engineering, and business schools which would indicate major field of study as a big factor in earnings.

allschools_nozero$SchoolName <- reorder(allschools_nozero$Name, allschools_nozero$MedEarning10yr) 

allschool_box <- ggplot(aes(SchoolName),data=allschools_nozero[1:50,]) + geom_boxplot(
  aes(ymin=a10, lower=twentyfivePctEarn10yr, middle =MedEarning10yr, upper=a75, ymax=a90),
  stat="identity", color = 'Blue') + theme(axis.text.y = element_text(size=8))

allschool_box + coord_flip() + ggtitle("Non-Elite Schools 10yrs After Attendance") + xlab("School Name") + ylab("Yearly Earnings $")

A box plot of the highest earning non-elite schools so that the variation form the median can be seen.