library(DT)
LiberalArtsColleges <- subset(data, Type == "LiberalArts"); nrow(LiberalArtsColleges)
## [1] 24
NationalUniversities <- subset(data, Type == "NationalUniversity"); nrow(NationalUniversities)
## [1] 134
CRITR <- function(n, alpha = .05){
df <- n - 2; CRITT <- qt(alpha/2, df, lower.tail = F)
CRITR <- sqrt((CRITT^2)/((CRITT^2) + df))
return(CRITR)}
ShannonH <- function(Props, rnd = 3){
LPI = log(Props)
H = (-rowSums(Props * LPI))
return(round(H, rnd))}
SimpsonD <- function(Props, rnd = 3){
PI2 = Props^2
D = 1/(rowSums(PI2))
return(round(D, rnd))}
RenyiH <- function(Props, rnd = 3){
q = rowSums(Props)
QH = (1/(1-q)) * log(rowSums(Props^q))
return(round(QH, rnd))}
EthnicPolarization <- function(Props, rnd = 3){
PI2 = Props^2
EP = 4 * rowSums(PI2 * (1 - Props))
return(round(EP, rnd))}
round(CRITR(c(24, 134, 158)), 3)
## [1] 0.404 0.170 0.156
datatable(data, extensions = c("Buttons", "FixedColumns"), options = list(dom = 'Bfrtip', buttons = c('copy', 'csv', 'print'), scrollX = T, fixedColumns = list(leftColumns = 3)))
Eric Kaufmann shared the Foundation for Individual Rights in Education’s (FIRE’s) 2020 and 2021 combined campus free speech ranking survey data on political alignment by college university on his website, here: http://www.sneps.net/viewpoint-diversity-ranking-of-leading-us-colleges. He used a variable dubbed “IVD”, or the Index of Viewpoint Diversity, where lower scores represented higher diversity. This index is computed as
\[\text{IVD} = ((P(\text{Democrat}) - P(\text{Republican})) + (P(\text{Liberal}) - P(\text{Conservative}))) \times 10,000\]
Unfortunately, this is hardly a diversity index. It is, instead, better understood, as computed, as the proportion of ideological and effective liberals, where these quantities overlap and may fail to overlap due to measurement error. It will only coincidentally correlate with actual diversity indices because of the overrepresentation of liberals and/or Democrats at universities, because the residual is often too small to mean much else. To better understand diversity, we can apply two real diversity indices from ecology to this data.
The first index is Shannon’s index, H, given by
\[H = - \sum_{i = 1}^s p_i \text{ln}p_i = - \sum_{i = 1}^s \text{ln}p_i^{p_i}\]
As the name suggests, this is an information theoretic index, and like another popular information theoretic index, BIC, it has assumptions about representation. With BIC, it is assumed that the true model is among those compared; with H, it is assumed that all categories - or more commonly in ecology, species - are represented in computation of the statistic. Sampling accuracy is also assumed. The next index is Simpson’s, D, given by
\[D = \frac{1}{\sum_{i = 1}^s p_i^2}\]
Simpson’s D is a dominance index, since it is weighted towards those categories - or more commonly in ecology, species - which are more common, and as such, dominant in the sample. This index can be useful in our case because of the aforementioned abundance of liberal individuals in the academy. This index will be less swayed by measurement error, because it will be less swayed by the small groups that are more likely in our case to be affected by them. It makes much more difference to measure 10% of the population with a margin of error of 5% than it does to measure 70% of the population with the same amount of error. The practical implications of small differences matter when it comes to understanding “diversity” in a population.
Shannon’s index is generalized to values of q - the sum of proportions - other than one as Renyi’s index, given by
\[^qH = \frac{1}{1 - q} \text{ln}(\sum_{i = 1}^s p_i^q)\]
Alternative names for Shannon’s index are Shannon’s diversity index, the Shannon-Wiener index, Shannon entropy, information gain (machine learning), and von Neumann entropy (quantum statistical mechanics). Alternative names for Simpson’s index are the Herfindahl-Hirschman index (economics), the inverse participation ratio (physics), the effective number of parties (political science; as its inverse), and heterozygosity (genetics; as the Gibbs-Martin, Blau, or Gini-Simpson indices, simple extensions of Simpson’s). There are numerous other appropriate indices, many, but of course not all, of which are used for analyzing economic inequality, like the Theil, Hoover, Robin Hood, or Atkinson indices, or any number of named ratios. The Hoover index is just the Index of Dissimilarity and Gorard’s index of segregation. Many of these are similar to the other Index of Segregation, and related to Lieberson’s isolation index, the Hutchen’s square root index, Bell’s index, the other index of isolation, the index of exposure, the Ochiai index, Kulczynski’s coefficient, Yule’s Q or Y, the Hamman coefficient, the Baroni-Urbani-Buser coefficient, the Rogers-Tanimoto coefficient, the Sokal-Sneath coefficient, Sokal’s binary distance, the Russsel-Rao coefficient, the Phi coefficient, Soergel’s coefficient, Simpson’s coefficient (not to be confused with Simpson’s index), Dennis’ coefficient, Forbes’ coefficient, the Simple match coefficient, Fossum’s coefficient, Stile’s coefficient, Michael’s coefficient, Peirce’s coefficient, the Hawkin-Dotson coefficient, the Benini coefficient, the Gilbert coefficient, the modified Gini index, Kuhn’s index, the Gini index (not to be confused with the modified Gini index or the Gini-Simpson index), the Eyraud index, Soergel’s distance, the Tanimoto index, the Piatetsky-Shapiro’s index, or various other similar and oftentimes identical indices like Jaccard’s, Dice’s, the Match coefficient, Morisita’s index, the Canberra metric, or the Bray-Curtis index (i.e., Czekanowski’s quantitative index). There are more indices.
In addition to Shannon’s, Simpson’s, and Renyi’s indices, I have provided code for the ethnic polarization index used by Montalvo & Reynal-Querol (2005). This is given by
\[\text{EP} = 4\sum_{i = 1}^s p_i^2(1-p_i)\]
Shannon’s index is interpreted such that higher values indicate more diversity, like Simpson’s and Renyi’s indices. The ethnic polarization index ranges between 0, where everyone is part of one category or as many categories as there are people, and 1, where there are two groups of the same size. It is odder and harder to interpret, I am just presenting it for fun.
Each of these diversity indices is useful and interesting in their own ways, and they can be handily employed to better quantify political diversity in FIRE’s dataset. Moreover, we can use FIRE’s data and diversity indices like these and Kaufmann’s preferred IVD to assess the relationship between the quality - or at least, the rankings - of universities and student political diversity. Proponents of college viewpoint diversity would predict that a student body with diverse views should achieve higher-quality schools and accordingly higher ranks. For this purpose, I will use the ranks, scores, and categories - National Universities versus Liberal Arts Colleges - for the universities in Kaufmann’s data, taken from the U.S. News and World Report. I have used the most-recent scores as of October 3rd, 2022 for this purpose. Values of Renyi’s index use ideology and party identification, since Renyi’s index is not problematized by that sort of analysis, and it will allow testing the correspondence to Kaufmann’s own IVD.
Critical values of the Spearman correlation are .404 for the sample of liberal arts colleges and .170 for the national universities. The aggregate sample critical value is .156 because
\[t = r\sqrt{\frac{n-2}{1-r^2}}\]
and thus,
\[r = -\frac{t\sqrt{n-2+t^2}}{n-2+t^2}\]
nLA = 24; tLA = qt(.05/2, nLA - 2)
nNU = 134; tNU = qt(.05/2, nNU - 2)
nAG = 158; tAG = qt(.05/2, nAG - 2)
rLA = -(tLA * sqrt(nLA - 2 + tLA^2))/(nLA - 2 + tLA^2); round(rLA, 3) #-(tLA...) because the coefficient is positive. Remove the '-' for negative coefficients, or just imagine it's reversed
## [1] 0.404
rNU = -(tNU * sqrt(nNU - 2 + tNU^2))/(nNU - 2 + tNU^2); round(rNU, 3)
## [1] 0.17
rAG = -(tAG * sqrt(nAG - 2 + tAG^2))/(nAG - 2 + tAG^2); round(rAG, 3)
## [1] 0.156
r2t <- function(r, n){
t = r * sqrt((n-2)/(1-r^2))
return(t)}
data$HDiversity <- ShannonH(data[4:6])
data$DDiversity <- SimpsonD(data[4:6])
data$RDiversity <- RenyiH(data[4:8])
data$EDiversity <- EthnicPolarization(data[4:6])
data
"USNWR Ranks"
## [1] "USNWR Ranks"
cor(data$IVD, data$USNWRRank, method = "pearson") #Wrong type of correlation; only presented for comparison
## [1] 0.05312508
cor(data$IVD, data$USNWRRank, method = "spearman")
## [1] -0.5058135
cor(data$HDiversity, data$USNWRRank, method = "spearman")
## [1] 0.5738253
cor(data$DDiversity, data$USNWRRank, method = "spearman")
## [1] 0.564462
cor(data$RDiversity, data$USNWRRank, method = "spearman")
## [1] 0.5622851
cor(data$EDiversity, data$USNWRRank, method = "spearman")
## [1] 0.4034535
"USNWR Scores"
## [1] "USNWR Scores"
cor(data$IVD, data$USNWRScore, method = "pearson")
## [1] -0.03021161
cor(data$IVD, data$USNWRScore, method = "spearman")
## [1] 0.4963161
cor(data$HDiversity, data$USNWRScore, method = "spearman")
## [1] -0.5645527
cor(data$DDiversity, data$USNWRScore, method = "spearman")
## [1] -0.5556395
cor(data$RDiversity, data$USNWRScore, method = "spearman")
## [1] -0.5528564
cor(data$EDiversity, data$USNWRScore, method = "spearman")
## [1] -0.3956895
pt(r2t(.4034535, nAG), nAG - 2, lower.tail = F) #Lowest correlation in the bunch for the ranks
## [1] 7.35207e-08
LiberalArtsColleges$HDiversity <- ShannonH(LiberalArtsColleges[4:6])
LiberalArtsColleges$DDiversity <- SimpsonD(LiberalArtsColleges[4:6])
LiberalArtsColleges$RDiversity <- SimpsonD(LiberalArtsColleges[4:8])
LiberalArtsColleges$EDiversity <- SimpsonD(LiberalArtsColleges[4:6])
LiberalArtsColleges
"USNWR Ranks"
## [1] "USNWR Ranks"
cor(LiberalArtsColleges$IVD, LiberalArtsColleges$USNWRRank, method = "pearson")
## [1] 0.1577661
cor(LiberalArtsColleges$IVD, LiberalArtsColleges$USNWRRank, method = "spearman")
## [1] -0.07883287
cor(LiberalArtsColleges$HDiversity, LiberalArtsColleges$USNWRRank, method = "spearman")
## [1] 0.09756112
cor(LiberalArtsColleges$DDiversity, LiberalArtsColleges$USNWRRank, method = "spearman")
## [1] 0.01655055
cor(LiberalArtsColleges$RDiversity, LiberalArtsColleges$USNWRRank, method = "spearman")
## [1] 0.05008718
cor(LiberalArtsColleges$EDiversity, LiberalArtsColleges$USNWRRank, method = "spearman")
## [1] 0.01655055
"USNWR Scores"
## [1] "USNWR Scores"
cor(LiberalArtsColleges$IVD, LiberalArtsColleges$USNWRScore, method = "pearson")
## [1] -0.1774138
cor(LiberalArtsColleges$IVD, LiberalArtsColleges$USNWRScore, method = "spearman")
## [1] 0.07883287
cor(LiberalArtsColleges$HDiversity, LiberalArtsColleges$USNWRScore, method = "spearman")
## [1] -0.09756112
cor(LiberalArtsColleges$DDiversity, LiberalArtsColleges$USNWRScore, method = "spearman")
## [1] -0.01655055
cor(LiberalArtsColleges$RDiversity, LiberalArtsColleges$USNWRScore, method = "spearman")
## [1] -0.05008718
cor(LiberalArtsColleges$EDiversity, LiberalArtsColleges$USNWRScore, method = "spearman")
## [1] -0.01655055
pt(r2t(.01655055, nLA), nLA - 2, lower.tail = F)
## [1] 0.4694083
NationalUniversities$HDiversity <- ShannonH(NationalUniversities[4:6])
NationalUniversities$DDiversity <- SimpsonD(NationalUniversities[4:6])
NationalUniversities$RDiversity <- SimpsonD(NationalUniversities[4:8])
NationalUniversities$EDiversity <- SimpsonD(NationalUniversities[4:6])
NationalUniversities
"USNWR Ranks"
## [1] "USNWR Ranks"
cor(NationalUniversities$IVD, NationalUniversities$USNWRRank, method = "pearson")
## [1] 0.1456497
cor(NationalUniversities$IVD, NationalUniversities$USNWRRank, method = "spearman")
## [1] -0.4208464
cor(NationalUniversities$HDiversity, NationalUniversities$USNWRRank, method = "spearman")
## [1] 0.5115962
cor(NationalUniversities$DDiversity, NationalUniversities$USNWRRank, method = "spearman")
## [1] 0.5076282
cor(NationalUniversities$RDiversity, NationalUniversities$USNWRRank, method = "spearman")
## [1] 0.4801009
cor(NationalUniversities$EDiversity, NationalUniversities$USNWRRank, method = "spearman")
## [1] 0.5076282
"USNWR Scores"
## [1] "USNWR Scores"
cor(NationalUniversities$IVD, NationalUniversities$USNWRScore, method = "pearson")
## [1] -0.1020902
cor(NationalUniversities$IVD, NationalUniversities$USNWRScore, method = "spearman")
## [1] 0.4208464
cor(NationalUniversities$HDiversity, NationalUniversities$USNWRScore, method = "spearman")
## [1] -0.5115962
cor(NationalUniversities$DDiversity, NationalUniversities$USNWRScore, method = "spearman")
## [1] -0.5076282
cor(NationalUniversities$RDiversity, NationalUniversities$USNWRScore, method = "spearman")
## [1] -0.4801009
cor(NationalUniversities$EDiversity, NationalUniversities$USNWRScore, method = "spearman")
## [1] -0.5076282
pt(r2t(.4801009, nNU), nNU - 2, lower.tail = F)
## [1] 2.178352e-09
Student political diversity is negatively associated with college rankings. This effect is significant in aggregate and for national universities, but not for liberal arts colleges, but that seems likely to be attributable to sample sizes and resulting range restriction. The interaction by type of college is significant, but that’s hardly worth much with such low power for the liberal arts college sample in the first place. The implications of the observation that diversity of viewpoint, at least in terms of politics, is uncorrelated with the quality of the university are potentially extreme. It may be that elite universities are ideological echo chambers. However, the imperfectness of the measurement of diversity along the dimensions used here suggests the need for future investigation with more granularity. Whatever the eventual conclusion, this finding is interesting.
Montalvo, J. G., & Reynal-Querol, M. (2005). Ethnic diversity and economic development. Journal of Development Economics, 76(2), 293–323.
Someone asked for correlation matrices.
round(cor(data[c(2:9, 11, 13:16)], method = "spearman"), 3)
## IVDRank IVD PLib PMod PCon PDem PRep USNWRScore
## IVDRank 1.000 0.853 0.923 -0.585 -0.903 0.910 -0.920 0.572
## IVD 0.853 1.000 0.801 -0.511 -0.791 0.787 -0.805 0.496
## PLib 0.923 0.801 1.000 -0.689 -0.863 0.946 -0.885 0.537
## PMod -0.585 -0.511 -0.689 1.000 0.306 -0.600 0.384 -0.259
## PCon -0.903 -0.791 -0.863 0.306 1.000 -0.864 0.963 -0.593
## PDem 0.910 0.787 0.946 -0.600 -0.864 1.000 -0.893 0.522
## PRep -0.920 -0.805 -0.885 0.384 0.963 -0.893 1.000 -0.575
## USNWRScore 0.572 0.496 0.537 -0.259 -0.593 0.522 -0.575 1.000
## USNWRRank -0.583 -0.506 -0.547 0.264 0.603 -0.531 0.587 -0.995
## HDiversity -0.977 -0.843 -0.922 0.631 0.891 -0.898 0.895 -0.565
## DDiversity -0.973 -0.844 -0.929 0.661 0.869 -0.899 0.883 -0.556
## RDiversity -0.977 -0.839 -0.929 0.663 0.871 -0.933 0.888 -0.553
## EDiversity -0.751 -0.664 -0.813 0.843 0.553 -0.735 0.603 -0.396
## USNWRRank HDiversity DDiversity RDiversity EDiversity
## IVDRank -0.583 -0.977 -0.973 -0.977 -0.751
## IVD -0.506 -0.843 -0.844 -0.839 -0.664
## PLib -0.547 -0.922 -0.929 -0.929 -0.813
## PMod 0.264 0.631 0.661 0.663 0.843
## PCon 0.603 0.891 0.869 0.871 0.553
## PDem -0.531 -0.898 -0.899 -0.933 -0.735
## PRep 0.587 0.895 0.883 0.888 0.603
## USNWRScore -0.995 -0.565 -0.556 -0.553 -0.396
## USNWRRank 1.000 0.574 0.564 0.562 0.403
## HDiversity 0.574 1.000 0.994 0.985 0.758
## DDiversity 0.564 0.994 1.000 0.984 0.766
## RDiversity 0.562 0.985 0.984 1.000 0.772
## EDiversity 0.403 0.758 0.766 0.772 1.000
round(cor(LiberalArtsColleges[c(2:9, 11, 13:16)], method = "spearman"), 3)
## IVDRank IVD PLib PMod PCon PDem PRep USNWRScore
## IVDRank 1.000 0.760 0.871 -0.797 -0.768 0.824 -0.767 0.194
## IVD 0.760 1.000 0.652 -0.598 -0.548 0.605 -0.547 0.079
## PLib 0.871 0.652 1.000 -0.725 -0.764 0.908 -0.786 0.136
## PMod -0.797 -0.598 -0.725 1.000 0.361 -0.635 0.374 0.131
## PCon -0.768 -0.548 -0.764 0.361 1.000 -0.700 0.924 -0.311
## PDem 0.824 0.605 0.908 -0.635 -0.700 1.000 -0.766 0.187
## PRep -0.767 -0.547 -0.786 0.374 0.924 -0.766 1.000 -0.309
## USNWRScore 0.194 0.079 0.136 0.131 -0.311 0.187 -0.309 1.000
## USNWRRank -0.194 -0.079 -0.136 -0.131 0.311 -0.187 0.309 -1.000
## HDiversity -0.969 -0.729 -0.852 0.835 0.790 -0.766 0.753 -0.098
## DDiversity -0.948 -0.708 -0.862 0.892 0.700 -0.788 0.713 -0.017
## RDiversity -0.928 -0.688 -0.826 0.897 0.627 -0.840 0.662 -0.050
## EDiversity -0.948 -0.708 -0.862 0.892 0.700 -0.788 0.713 -0.017
## USNWRRank HDiversity DDiversity RDiversity EDiversity
## IVDRank -0.194 -0.969 -0.948 -0.928 -0.948
## IVD -0.079 -0.729 -0.708 -0.688 -0.708
## PLib -0.136 -0.852 -0.862 -0.826 -0.862
## PMod -0.131 0.835 0.892 0.897 0.892
## PCon 0.311 0.790 0.700 0.627 0.700
## PDem -0.187 -0.766 -0.788 -0.840 -0.788
## PRep 0.309 0.753 0.713 0.662 0.713
## USNWRScore -1.000 -0.098 -0.017 -0.050 -0.017
## USNWRRank 1.000 0.098 0.017 0.050 0.017
## HDiversity 0.098 1.000 0.975 0.933 0.975
## DDiversity 0.017 0.975 1.000 0.963 1.000
## RDiversity 0.050 0.933 0.963 1.000 0.963
## EDiversity 0.017 0.975 1.000 0.963 1.000
round(cor(NationalUniversities[c(2:9, 11, 13:16)], method = "spearman"), 3)
## IVDRank IVD PLib PMod PCon PDem PRep USNWRScore
## IVDRank 1.000 0.869 0.952 -0.520 -0.927 0.936 -0.947 0.522
## IVD 0.869 1.000 0.846 -0.475 -0.826 0.826 -0.842 0.421
## PLib 0.952 0.846 1.000 -0.670 -0.872 0.944 -0.893 0.528
## PMod -0.520 -0.475 -0.670 1.000 0.271 -0.570 0.367 -0.258
## PCon -0.927 -0.826 -0.872 0.271 1.000 -0.862 0.950 -0.531
## PDem 0.936 0.826 0.944 -0.570 -0.862 1.000 -0.894 0.497
## PRep -0.947 -0.842 -0.893 0.367 0.950 -0.894 1.000 -0.520
## USNWRScore 0.522 0.421 0.528 -0.258 -0.531 0.497 -0.520 1.000
## USNWRRank -0.522 -0.421 -0.528 0.258 0.531 -0.497 0.520 -1.000
## HDiversity -0.970 -0.853 -0.960 0.597 0.903 -0.922 0.907 -0.512
## DDiversity -0.965 -0.852 -0.961 0.617 0.886 -0.918 0.900 -0.508
## RDiversity -0.925 -0.811 -0.939 0.685 0.815 -0.958 0.831 -0.480
## EDiversity -0.965 -0.852 -0.961 0.617 0.886 -0.918 0.900 -0.508
## USNWRRank HDiversity DDiversity RDiversity EDiversity
## IVDRank -0.522 -0.970 -0.965 -0.925 -0.965
## IVD -0.421 -0.853 -0.852 -0.811 -0.852
## PLib -0.528 -0.960 -0.961 -0.939 -0.961
## PMod 0.258 0.597 0.617 0.685 0.617
## PCon 0.531 0.903 0.886 0.815 0.886
## PDem -0.497 -0.922 -0.918 -0.958 -0.918
## PRep 0.520 0.907 0.900 0.831 0.900
## USNWRScore -1.000 -0.512 -0.508 -0.480 -0.508
## USNWRRank 1.000 0.512 0.508 0.480 0.508
## HDiversity 0.512 1.000 0.994 0.945 0.994
## DDiversity 0.508 0.994 1.000 0.948 1.000
## RDiversity 0.480 0.945 0.948 1.000 0.948
## EDiversity 0.508 0.994 1.000 0.948 1.000
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DT_0.20
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.28 R6_2.5.1 jsonlite_1.7.2 magrittr_2.0.1
## [5] evaluate_0.14 rlang_0.4.12 stringi_1.7.5 jquerylib_0.1.4
## [9] bslib_0.3.1 vctrs_0.3.8 rmarkdown_2.11 tools_4.1.2
## [13] stringr_1.4.0 htmlwidgets_1.5.4 crosstalk_1.2.0 xfun_0.27
## [17] yaml_2.2.1 fastmap_1.1.0 compiler_4.1.2 htmltools_0.5.2
## [21] knitr_1.36 sass_0.4.0