Introduction

Question and Setup

Do Americans conceive of a class of elite universities distinct from the rest, and if so what does it look like. To find out, I asked them. Specifically, I displayed universities two at a time to 1,250 Americans and asked them which, if either, they considered the more elite.

The National Center for Educational Statistics counts 2,914 four-year, undergraduate degree-granting institutions in the 2015-16 academic year; too many to include in a survey, and most sound made up. (No offense, Wayland Baptist University.) To get that to a manageable number whose names have some chance of lurking within the average Americans’ consciousness, I limited my survey to the 89 schools that secured a top-200 spot in both the US News & World Report and Forbes 2016 college rankings. Superior writers have devoted much ink to college rankings’ flawed methodologies and harmful incentives. Dangerous and stupid though they are, college rankings are omnipresent and popular, and my final list of 89 includes schools of all types, sizes, endowments, and affiliations, from all regions of the US. It is, however, a bit heavy on research institutions and light on straight liberal arts programs.

My head-to-head or “pairwise” approach to assess elite status maximizes the information that I can wring from survey participants, is difficult to manipulate, is a conservative test of my theory, and is distressingly fun. Pitting 89 schools against one another two at a time meant there were 3,916 possible combinations that a participant might encounter. Participants could go through as many or as few match-ups as they wanted, and the match-ups they saw were adaptively presented so as to increase the amount I learned from each new response, given previous responses. Having no say in what match-ups they judged, any participant wanting to “vote up” their favored school had to sit through a great many comparisons before encountering such an opportunity. Even then, each school appeared in around 5,000 contests, so while 22% of respondents did at some point encounter a school they attended, any one insincere vote counted for little.

Whereas survey participants might have some agenda in how they voted, I absolutely had an agenda in asking them to vote. My larger argument rests on the assumption that Americans conceive of a distinct class of elite university. My test of that assumption resists such clustering by encouraging participants to differentiate between schools, even very similar schools. Pairwise comparisons require participants to rank two discrete options (School A and School B) on a specified criterion (elite status). Choosing between Wisconsin and Princeton is easy, but selecting a winner between Wisconsin and Illinois or Princeton and Yale is difficult. If participants routinely rated competing schools as equally elite, we would see those schools clump together in the final elite-ness ranking. That would be great for my argument, but it may reflect participant laziness rather than their honest assessment of school status. The bolder participants are in distinguishing between schools, the more fine-grained my final measure, and the less likely we are to see the pattern I expect to see.1

As you can see in the above screenshot, the default visual design of the online pairwise comparison platform All Our Ideas encourages participants to select a winner in each match-up, which should make clustering of the sort I expect to see less likely. The two contestant schools appear in large blue buttons directly beneath the text asking participants to prioritize between them. The “I can’t decide” button, meanwhile, is small and the same color as the background. Moreover, selecting “I can’t decide” produces an annoying pop-up window wherein the participant must choose one of seven options indicating why she can’t decide. This additional question yields interesting data (e.g., does the participant not know enough about one or both schools to make a judgement, or does she consider them the same, etc.), but it is cumbersome from the participant’s vantage. All of this is meant to make it so that participants will only select “I can’t decide” when they really cannot decide. It seemed to work. A mere 7% of the 173,669 votes participants cast were abstentions. The All Our Ideas platform is open source and fully customizable so interested researchers can, with a little effort, make the “I can’t decide option” as appealing or unappealing as they like. In future work on this topic, I plan to make ties between schools as obvious and direct an option as choosing one or another school, which I believe will yield a more accurate—but less conservative—assessment of school elite-ness.

Sample and Data

I recruited my survey participants in two separate cohorts during the fall and winter of 2016 using Amazon’s Mechanical Turk. MTurk respondents skew younger and more liberal than the general population, but they are more representative than convenience sample recruits typically used in academic research, and their well-documented biases ease subsequent statistical analysis. All told, the 500 participants from Sample 1 made 57,426 comparisons and the 750 participants from Sample 2 made 116,243 comparisons. That’s an average of about 115 comparisons/respondent in Sample 1 compared to 155 comparisons/respondent in Sample 2. Differences in remuneration and instructions likely account for the different response rates between the two samples. I asked Sample 1 participants to make as many comparisons as they liked in exchange for $0.12. I asked, but did not require, Sample 2 participants to spend at least two minutes making comparisons in exchange for $0.08. The lax requirements for payment and the vast number of possible comparisons means that respondents vary markedly in their contributions. All Our Ideas was designed with this in mind. It and similar wiki surveys are greedy insofar as they record and use as much or as little information as a respondent is willing to give.

You can download all respondent votes and MTurk data at my publicly accessible GitHub repository. Mercifully, All Our Ideas translates the nearly 175,000 observations into the more manageable, interpretable dataset previewed below.2 Given participant responses, weak priors, and two relatively light assumptions,3 All Our Ideas uses Bayesian inference to calculate a school’s estimated elite-ness score. The calculations are fully documented, and—as with the platform itself—the code underlying those calculations is freely available for replication or tweaking.

school score500 rank500 score750 rank750 diffScore diffRank avgScore avgRank
American University 34 80 32 76 -2 -4 33.0 78.0
Auburn University 49 40 49 40 0 0 49.0 40.0
Baylor University 54 25 58 23 4 -2 56.0 24.0
Binghamton University, SUNY 41 63 33 74 -8 11 37.0 68.5
Boston College 63 17 66 16 3 -1 64.5 16.5
Boston University 63 17 66 16 3 -1 64.5 16.5
Brandeis University 38 72 35 67 -3 -5 36.5 69.5
Brigham Young University 40 66 49 40 9 -26 44.5 53.0
Brown University 67 13 78 9 11 -4 72.5 11.0
California Institute of Technology 56 21 56 26 0 5 56.0 23.5

A school’s score gives the estimated probability that someone would identify it as more elite than a randomly chosen school. The California Institute of Technology, for example, earned a score of 56 in both samples, meaning that Caltech will likely “win” against 56% of the other schools on my list. In addition to elite-ness score, you can think about a school’s elite-ness rank, which is its ordinal position [1–89] relative to other schools. The lower a school’s rank, the higher it’s score and the better positioned it is vis-a-vis schools of higher rank (lower score) in terms of perceived elite status. Caltech’s score of 56 earned it a rank of 21 out of 89 in Sample 1, 26 out of 89 in Sample 2, and an average rank of 23.5 across the two samples. Rank does not show the distance separating schools’ scores, only the order in which they are arrayed. Multi-way ties are possible in both score and rank. If two or more schools hold the same score, they all receive the lowest (i.e., most flattering) possible rank. In the dataset preview above, you can see a school’s score and rank in each sample (denoted by the 500 and 750 suffixes in variable names) as well as the differences between and averages of a school’s score and rank across samples.


Grouping schools

There are myriad variables by which we could simultaneously group schools, such as endowment size, student body size, graduation rates, degree offerings, etc. At this stage of my investigation, I am only interested in the popular perception of a school’s elite status, and can thus partition the 89 schools on that single dimension. I do so first by looking for natural breaks in the density distributions of school elite-ness scores. I then compare density-based groupings with k-Means clustering techniques.

Estimating breaks with densities

Kernel density estimates (KDEs) are useful in determining the number and approximate location of cut-points for univariate data. To find these cut-points I looked for local minima in the KDEs of the elite-ness scores from each sample. In calculating a KDE, analysts must choose among several sci-fi-sounding kernels (e.g., Epanechnikov, triangular, Gaussian) and they must specify something called a bandwidth. These choices influence the shape of the final density estimate. There are many metrics with which an analyst can justify their kernel and bandwidth, and with which they can impugn others’.

To satisfy as many readers as possible, and to make life easier for them if they want to test their preferred kernel-bandwidth pairings, I present the results—and provide the R code—for 15 common combinations. Below you will see the KDEs and local minima for Gaussian, Epanechnikov, bi-weight, and (optimal) cosine kernels, paired (when possible) with the following bandwidths: “nrd0”, Silverman’s rule of thumb; “sj”, Sheather and Jones’s pilot estimation of derivatives; “ucv”, unbiased cross validation; and the “dpik” direct plug-in calculation and approach. You can read about the kernels and the first three bandwidths in the density package and its attending documentation, while the “dpik” bandwidth require the KernSmooth package. If not convinced, or if you have theoretical motivations to try a different kernel-bandwidth pairing, please feel free to use my data and tweak my code as you see fit. There are many univariate density estimation packages and methodologies available in R, which capable methodologists have evaluated and assessed.

The presence of local minima in the densities suggests that there is a “natural break” thereabouts in the data, but the exact location of the break is not so clear. That local minima reliably occur in the same narrow range of a density curve across various kernel-bandwidth pairs is telling, but where precisely they fall along the x-axis is finicky. For example, the thirty densities I calculate here produce 129 turning points that lie outside of the range of observed school scores.4 More important than calculating the precise value at which to divide clusters is determining whether or not we should group the schools into clusters at all, and if so, how many clusters. After that, we’ll calculate the average location of viable minima (i.e., those that reliable occur in the same range across kernel-bandwidth pairs) to give ourselves a starting point from which to assign schools to clusters, should they appear to exist.

Sample 1

In eight of the Sample 1 densities we see local minima occurring at scores just above 60. There are two occasions when there are two local minima tightly grouped just above that score, sufficiently so that I presume their doubleness an artifact of the bandwidth and treat them as one. Additionally, in two densities there are local minima at scores below 50, which I treat as outliers. In seven of the densities, there are no local minima within the range of observed scores.

Sample 2

In twelve of the Sample 2 densities we see local minima occurring at scores just below 75. In three cases, there are multiple minima just below that score, but they are so tightly grouped that I consider their separateness an artifact of the bandwidth and group them together. In four of the densities, there are additional local minima slightly above 75, tightly bunched when they number more than one. Additionally, in one of the densities there is a local minimum at a score below 50, which I treat as an outlier. In three of the densities, there are no local minima within the range of observed scores.

Denisty-based conclusions

The takeaway from the density method of calculating natural breaks: We can be reasonably confident dividing the schools into two clusters in Sample 1 and quite confident dividing the school into two clusters in Sample 2. There is little reason to think that Sample 1 demands more than two clusters, but some support for forgoing clustering altogether. In Sample 2, density analysis yields meager evidence for either more or fewer than two clusters. Moving forward with the majority supported two-clusters, the mean location of the viable minima are 65.6 for Sample 1 and 71.5 for Sample 2.

The general shapes of the densities themselves are suggestive of two clusters in each sample. Ignoring the exact location of local minima for a moment, the density curves take on a distinct shape right around a score of 60 in Sample 1 and right around a score of 70 in Sample 2. These scores will serve as my visually approximated “eyeball” cut-points, useful for comparison with the more formal density minima approach, as well as the alternative approaches detailed momentarily.

Comparing alternative cut-points

The density method for estimating natural breaks favors two clusters of schools in each sample. As such, I next calculate cluster breaks using k-means methods, assuming two clusters. K-means and related clustering techniques assign data to one of k groups (in this case, k = 2) so as to minimize the within-group distances separating observations. Jenks and Fisher are closely related clustering methods within the broader k-means family and can be found in the classInt package. I also present results from a straight-ahead application of the k-means method using the Ckmeans.1d.dp package.

The graph below shows schools graphed as semi-transparent circles at their elite-ness scores in each sample. Schools with the same score and rank in a sample will sit atop one another and appear together as a darker circle. Schools with higher scores (lower, better ranks) appear to the right of schools with lower scores (higher, worse ranks). The horizontal distance between circles indicates the difference between the schools’ scores, whereas school rank is conveyed by their left-right ordering alone. The color-coded vertical bars show each clustering method’s preferred cut-points.

For Sample 1, the Jenks, Fisher, and CkMeans calculations line up with our visual “eyeball” inspection of the density curves, all three placing the break in the noticeable gap spanning the scores between 56 and 61. The average density minimum falls a bit higher, intersecting what looks to be an otherwise cohesive grouping of schools. I choose to divide the schools into groups at the gap favored by the majority of clustering techniques, which also accords with a commonsense reading of the various density curves even if the average local minimum of those curves falls a bit outside the gap.

For Sample 2, the visual inspection and average local minimum of the densities place the break in the gap spanning scores between 67 and 72. I choose to divide the schools into groups at this gap, despite the tight grouping of breaks proposed by the other clustering techniques. These alternate, k-means-based cut-points intersect an otherwise cohesive group of schools, while the placement of the local density minima was not only robust across kernel-bandwidth pairings, but consistently fell in this obvious gap.

The above graph shows how the different clustering techniques divide the schools when the number of clusters was set at 2. It is the same information conveyed in the previous graph, but rather than comparing cut-points, you can compare cluster assignments. I jittered the circles vertically to better show their distribution, but their horizontal positions remain unaltered. Schools of like color fall—according to the specified clustering technique—within the same elite-ness class.

Altering the number of clusters

Rather than looking at the data’s KDE, we can try to determine the appropriate number of clusters k using the Bayesian information criterion (BIC). The Ckmeans.1d.dp package reports what number of clusters maximizes the BIC for univariate k-means clustering.

As you can see in the above graph, BIC (which I’ve normalized by sample size) is maximized in Sample 1 for k-means and k-median techniques when the number of clusters is set to two, and the cluster assignments are the identical to those that were informed by visual inspection of the KDE. When applied to the Sample 2 data, which KDE persuasively divided into two groups, the BIC is maximized for k-means and k-median techniques when the number of clusters is set to one, suggests that the scores from Sample 2 are best conceived (according to k-means) as belonging to a single group.

For the sake of completeness, the above graph shows cluster assignments for straight-ahead k-means methods, as well as Fisher’s and Jenks’ eponymous methods for optimizing natural breaks, as the number of clusters varies from two to six. All three methods endeavor to minimize within-group distances, and all three methods produce identical classifications for the displayed number of clusters, though they do eventually diverge. I find arguments for these divisions at greater than two clusters visually unpersuasive, regularly lumping schools that have proximate neighbors with distant counterparts. Schools on the fringes of an interval are forced to ally with schools across obvious gaps rather than with their immediate, tightly-packed comrades. I levy the same criticism against the k-means division of Sample 2 when the number of clusters is held at two.


Final groupings

The below graph shows the final clustering that I use in my paper, for the reasons detailed above. Schools represented by blue circles are in the specified sample’s “elite” cluster. Schools represented by red circles are in the specified sample’s “ordinary” cluster. In both samples, ordinary schools sit in a tightly packed mass, while elite and schools are comparatively few and strung out.

More important is how schools move across samples, which the below graph helps us understand. Lines connect the same school across the two samples. Schools twice categorized as elite are connected by blue lines. Schools always in the ordinary cluster are connected by red lines. Schools that switch clusters are connected by purple lines. The slope of a line indicates the severity of its change in score one sample to the next, where a vertical line means the school received the same score in both samples. Intersecting lines indicate that a school changed rank across samples, the number of lines it crosses equal to the number of rank positions it fell or ascended. If a school’s line does not cross with another, it maintained the same rank.

school avgRank avgScore
Harvard University 1.0 89.5
Yale University 2.0 87.0
Princeton University 3.0 85.0
Stanford University 4.0 83.5
Johns Hopkins University 5.5 79.0
Duke University 6.0 79.0
Cornell University 6.0 78.5
Massachusetts Institute of Technology 9.0 75.0
University of Notre Dame 10.0 74.5
Dartmouth College 10.0 74.0
Columbia University 10.5 74.0
Brown University 11.0 72.5
Georgetown University 12.0 72.5
Vanderbilt University 14.5 68.5
University of California, Berkeley 15.0 66.0
Boston College 16.5 64.5
Boston University 16.5 64.5
Carnegie Mellon University 18.0 63.5
New York University 19.0 63.0
Northwestern University 20.0 62.0

There are 14 schools that shake out as elite in both samples, and an additional six schools that switch categories across samples. You can see these b-list “elite-ish” universities living in the 60–70 score range of both samples, connected across samples by purples segments in the above figure. They form a little independent-looking gaggle in the Sample 2 results. Hanging out among that elite-ish pack in Sample 2 is Purdue University, but unlike its neighbors who find themselves similarly scored in Sample 1, Purdue is solidly in Sample 1’s ordinary crowd.

The 20 elite and elite-ish schools are listed in the nearby table and in the same color as their cross-sample lines. All elite schools are private, as are all b-list schools save UC Berkley, and those schools speak with a predominantly East Coast accent. Two of the 20 are commonly thought of as engineering schools and three are Catholic. That Boston University and Boston College have identical average scores and ranks makes me wonder if people know the difference between them. Most interesting may be the schools that don’t achieve elite or b-list status. University of Pennsylvania is the only Ivy Leaguer that doesn’t make the cut, though just barely,5 and University of Chicago doesn’t come anywhere close. Also missing are nouveau riche institutions like George Washington University and Washington University in St. Louis.6

Except for the six elite-ish schools, ordinary schools only ever swap rank with one another. Same for elite schools. Indeed, if I take a small liberty and group elite with elite-ish, the schools split into two continents, one red and the other blue and purple.7 Migrations in rank are wholly intra-continental, save Purdue University’s Sample 2 sortie—the lone red bridge joining the otherwise insular landmasses, one lying above and one below 60 on the horizontal score axis.

Not only do the schools largely stay within their calculated categories, but the manner in which they move about those categories is exactly as we would expect. Ordinary schools do a lot of place-swapping between samples (criss-crossed red lines), often jumping or falling a great many spots in rank (red lines with a shallow slope crossing several other red lines). Elite schools have comparatively stable scores and rankings (steep, rarely intersecting lines). On average, an ordinary school moves 6.8 positions up or down with a standard deviation of 5.8 spots. Elite schools on the other hand move up or down the list a mere 1.4 spots, with a standard deviation of 1.5 spots. F, Levene, and Brown-Forsythe tests confirm that the mean and variance of rank changes (as absolute values) are greater for ordinary schools than for elite schools, all with p < 0.00. Survey participants ranked elite schools more boldly, more precisely than they did ordinary schools, whose appropriate place in the list is either unclear or up for debate.

In sum, survey responses generate compelling evidence that Americans conceive of a distinct category of elite schools who number between 14 and 20. Perceptions of the relative status of these schools is stable and broadly shared. Non-elite schools rank bunch together in at least one distinct mass and therein trade standings with one another at a comparably tumultuous rate. The mass of ordinary schools would have been denser and their rank more roiling had my test been less conservative and participants been freer to deem competing schools as equally elite. The same condensing may have occurred among elite schools, but that would only further solidify my argument of those schools constituting a distinct class.

In my forthcoming paper, which partly relies on the analysis and code here presented, I explain why this is a really messed up thing in a democracy. Like, really messed up. See you then!


  1. Honest to God, I ate an entire pack of store-bought cookies while writing this section, so please, appreciate this rare instance of self-restraint.

  2. The datasets examined in this document are, in wiki survey parlance, “opinion matrices” and “summarized opinion matrices”.

  3. The statistical model underlying All Our Ideas assumes that (i) “respondents’ responses reflect their relative preferences between items,” and (ii) “the distribution of preferences across respondents for each item follows a normal distribution”. The authors discuss these assumptions in length in supplementary material to their original article.

  4. See the outsideTurnPnts object in the R code.

  5. Like most Americans who didn’t attend UPenn, I always forget UPenn is in the Ivy League.

  6. Because nothing says “credible” like a hippopotamus-ivory-denture-wearing slave-owner.

  7. The makings of a third continent are hinted at on the far left of the final figures, where a gulf in scores separates the Colorado School of Mines and Yeshiva University from the 87 other schools. Assigning those two schools to a distinct class is not supported by any of the clustering methodologies, but it may be that—had I included a sufficient number of generally-unfamiliar schools in my survey—cluster analyses would identify a third category of “obscure” schools.