1 Abstract

An opinion piece by Abigail Thompson in the AMS Notices has engendered a lot of discussion, including not one, not two, but three open letters to the Notices. We analyze the professional profiles of the signers of the three letters, and propose a theory for why the results are what they are.

2 Introduction

In November 2019, Abigail Thompson, chair of Mathematics at UC Davis and Vice President of the American Mathematical Society, published an essay [1] in the Notices of the AMS that criticized the usage of Mandatory Diversity Statements when hiring mathematics faculty. She described Diversity Statements as a “political test” and compared it to McCarthyism.

In December 2019, a multitude of responses to Thompson’s Letter were published in Notices [2], accumulating hundreds of signatures. One letter, titled “The math community values a commitment to diversity,” “strongly (disagreed) with the sentiments and arguments in Dr. Thompson’s editorial,” and hoped “that the AMS will reconsider the way that it uses its power and position in the mathematics communities in these kinds of discussions.” Another letter, titled “Letter to the Editor,” spoke of “grave concerns about recent attempts to intimidate a voice within our mathematical community.” In this letter, they reference a blog post which encouraged faculty to “advise grad-school-bound undergraduate students – especially students who are minoritized along some axis – not to apply to UC Davis.” [3] A final letter, titled “Letter to the Notices of the AMS,” criticized the usage of mandatory diversity statements, but affirmed the importance of diversity in mathematics.

For the purpose of this exploration, Letter A will refer to the letter titled “The math community values a commitment to diversity,” Letter B will refer to the one titled “Letter to the Editor,” and Letter C will refer to the one titled “Letter to the Notices of the AMS.”

We will analyze the age (relative to PhD year) and gender of letter signers. We will also analyze the number of MathSciNet citations and citations per year of letter signers in this study. We will also analyze the number of Google Scholar citations, citation per year and h-index of letter signers. We prefer MathSciNet because only published mathematics are in MathSciNet, and is hence a higher quality data source when comparing mathematicians. We also assess the distributions of MathSciNet and Google Scholar citations.

It should be noted and emphasized that citations and h-indices do not impose a total order on the quality of a mathematician - indeed, it is quite obvious that, unlike in competitive swimming, imposing such an order is a fool’s errand. For example, Stephen Smale has fewer citations than Terence Tao, but it would very difficult to distinguish who is in fact the better mathematician. However, citations generally reflect the mathematics communities opinion of a person, and is the only empirical metric of assessing this.

Most of the rest of this paper gives the methodology of our statistical analyses, and the impatient reader can skip straight to the Conclusions and Discussion - Section 6.

3 Data Collection

Data was collected on December 16-18, 2019 from Google Scholar and the Mathematics Genealogy Project. After a list of names and affiliations were scraped from the AMS response letters, signers were searched on Google Scholar and their citation numbers and h-index were collected. This was done using the scholarly API and manual checks.[4] [5] Then the math-genealogy-scraper was used to calculate PhD years and errors (duplicate names) were also corrected manually. [6] [7]. MathSciNet entries were collected manually. Finally, the data was merged with the QSIDE dataset released on December 28th [8].

4 Summary Statistics

##        X1             name           affiliation          citations    
##  Min.   :   0.0   Length:1435        Length:1435        Min.   :    1  
##  1st Qu.: 358.5   Class :character   Class :character   1st Qu.:  158  
##  Median : 717.0   Mode  :character   Mode  :character   Median :  831  
##  Mean   : 717.0                                         Mean   : 2796  
##  3rd Qu.:1075.5                                         3rd Qu.: 2840  
##  Max.   :1434.0                                         Max.   :71530  
##                                                         NA's   :920    
##      hindex            year       lettergroup        gender   
##  Min.   :  1.00   Min.   :1957   A and B:  6   man      :969  
##  1st Qu.:  6.00   1st Qu.:1984   A Only :615   nonbinary:  1  
##  Median : 15.00   Median :1999   B and C: 74   woman    :452  
##  Mean   : 18.45   Mean   :1996   B Only :600   NA's     : 13  
##  3rd Qu.: 26.00   3rd Qu.:2009   C Only :134                  
##  Max.   :106.00   Max.   :2019   NA's   :  6                  
##  NA's   :922      NA's   :553                                 
##         highered           institution    research             country    
##  highered   :1361   domesticother:379   lessri:487   israel        :  46  
##  nothighered:  41   domesticr1   :674   moreri:878   canada        :  37  
##  NA's       :  33   domesticr2   :107   NA's  : 70   united kingdom:  21  
##                     international:205                germany       :  14  
##                     NA's         : 70                france        :  13  
##                                                      (Other)       :  49  
##                                                      NA's          :1255  
##         role           security      field        simplefield  
##  professor:635   lesssecure:425   comp  :  16   mathed  :  33  
##  associate:202   moresecure:902   math  :1216   mathstat:1228  
##  assistant:192   NA's      :108   mathed:  33   other   :  90  
##  grad     :119                    other :  74   NA's    :  84  
##  ntt      : 97                    stat  :  12                  
##  (Other)  : 83                    NA's  :  84                  
##  NA's     :107                                                 
##   fellows            amscit           Fields               age       
##  Mode :logical   Min.   :    0.0   Length:1435        Min.   : 1.00  
##  FALSE:1223      1st Qu.:  447.5   Class :character   1st Qu.:11.00  
##  TRUE :206       Median : 1006.0   Mode  :character   Median :21.00  
##  NA's :6         Mean   : 1650.5                      Mean   :23.87  
##                  3rd Qu.: 1813.5                      3rd Qu.:36.00  
##                  Max.   :15430.0                      Max.   :63.00  
##                  NA's   :1131                         NA's   :553    
##    citperyear       amscitperyear   
##  Min.   :   0.118   Min.   :  0.00  
##  1st Qu.:  18.688   1st Qu.: 15.96  
##  Median :  48.062   Median : 27.55  
##  Mean   : 113.541   Mean   : 44.95  
##  3rd Qu.: 111.271   3rd Qu.: 52.19  
##  Max.   :3223.524   Max.   :642.92  
##  NA's   :1056       NA's   :1188

5 Exploratory Data Analysis

5.1 NaN Visualization

The citations and h-index column refers to citations and h-indices pulled from Google Scholar. The amscit column refers to citations pulled from Math Sci Net. The year column refers to PhD years scraped from the Mathematics Genealogy Project. The age column was induced by subtracting 2020 from PhD years. The citperyear and amscitper year columns were induced by dividing citations by age. The fellows column refer to individuals who are fellows of the AMS.

64.11% of the Google Scholar citations is NaN, and 75.47% of the mathscinet citations is NaN. While this is not optimal, a quick sample size calculation shows that one needs 303 samples or 21% of the data to produce statistics at a 95% confidence level and a 5% confidence interval.

5.2 Distribution of Google Scholar Citations

The data is heavily left skewed. This is usually handled either by transforming the data with a natural logarithm, or by square rooting. Applying a logarithm is a more appropiate transformation to assess normality, and a square root is more appropriate to assess exponentiality. We try both.

This data is now right skewed.

The data is heavy towards the center, and the tails are sparsely populated. Hence the data is unlikely to be normally distributed.

If we construct a qq-plot with a fitted exponential curve, we find that there is divergence in the tails.

We now apply a square root.

This looks like an exponential distribution.

An exponential distribution seems to fit the data better, post square rooting. Some fine tuning shows that raising the data to 0.46 produces the closest approximation to an exponential distribution.

5.3 Distribution of MathSciNet citations

The data is again left skewed. Fitting an exponential distribution to this data, we see that there is divergence in the tails.

We now apply a square root.

The data again looks approximately exponentially distributed, except for the data on the right. This is due to the presence of academics with zero citations.

The divergence in the left tail is caused by a number of mathematicians with zero citations. Some fine tuning shows that raising the data to 0.54 produces the closest approximation to an exponential distribution.

If we check if the data is Normally distributed, we see again that there is sparsity in the tails. Hence it is unlikely that the data is log normal.

So the MathSciNet data and GS data, when appropriately transformed, appear to be approximately Exponentially distributed.

5.4 Permutation Tests

A permutation test is a nonparametric way of assessing the difference in mean between two populations. We are interested in whether an observed difference in mean is due to chance, and we can assess this in the following way.

Record the true difference in mean (\(d\mu\)).
\(H_0: d\mu = 0, H_1: d\mu < 0\)
Sample without replacement 1/2 of the combined data set (X) and what is left (Y)
Take the mean of X and Y and record the difference
Repeat 10,000 times and plot the histogram
Record the number of points (m) in the induced distribution that is more extreme than or equal to the observed \(d\mu\). The probability m/10,000 is the probability that what was observed was due to chance.

Here is the function we will use to do this.

# input data and the number of permutations
meanPermutation <- function(Data, n){
  output <- matrix(NA, ncol = 1, nrow = n)
  for(i in 1:n){
    #sample 1/2 of the data
    X_index <- sample(1:length(Data), floor(0.5 * length(Data)))
    Y_index <- setdiff(1:length(Data), X_index)
    X <- Data[X_index]
    Y <- Data[Y_index]
    #calculate the difference
    diff <- mean(X)-mean(Y)
    #store
    output[i, ] <- diff
  } 
  return(output)
}

The induced probability is similar to p-value, and often produces a similar p-value to a 2-sample t-test. However, it is not a p-value and cannot be accurately interpreted using the standard 0.05 significance benchmark. Instead, probabilities are assessed relatively.

5.5 Gender

What is the proportion of female professors who signed letters A, B, and C. According to a 2016 AMS survey [9], 707/4902 = 14.4% of tenured professors are women, and including all professionals, 2004/9921 = 20.2% are women.

We can determine this by using dplyr’s filter function.

#search via booleans
table(filter(df,((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")))$gender)

## 
##       man nonbinary     woman 
##        79         0        68

On letter A, 68/147 = 46.3% of professors were women.

table(filter(df,((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")))$gender)

## 
##       man nonbinary     woman 
##       333         0        47

On letter B, 47/380 = 12.4% of professors were women.

table(filter(df,((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")))$gender)

## 
##       man nonbinary     woman 
##       145         0        37

On letter C, 37/182 = 20.3% of professors were women.

So while letter A was signed by proportionally more female professors than the proportion determined by the AMS, letters B and C were generally reflective of the field, with C over representing the number of tenured female professors, and B slightly under representing the number of tenured female professors.

5.6 Age

What is the mean age of signers relative to PhD graduation? Are signers of B and C older than signers of A?

## [1] 14.6435

## [1] 27.75621

## [1] 35.48

## [1] 13

## [1] 27

## [1] 37

The mean time since PhD completion of signers of Letter A is 14.64 years and the median time is 13 years. The mean time since PhD completion of signers of Letter B is 27.76 years and the median time is 27 years. The mean time since PhD completion of signers of Letter C is 35.48 years and the median time is 37 years.

So signers of Letter C seem to be older than signers of Letter B, who in turn seem older than signers of letter A. Let’s validate this using a permutation test.

muA <- mean(filter(df, (lettergroup == "A Only"|lettergroup == "A and B"))$age, na.rm = TRUE)
muB <- mean(filter(df, (lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B"))$age, na.rm = TRUE)
muC <- mean(filter(df, (lettergroup == "C Only"|lettergroup == "B and C"))$age, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC
set.seed(0)
dist <- meanPermutation(na.omit(df$age,cols="age"),10000)

hist(dist,
     main = "Age",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

All three differences in mean age lie outside of the induced distribution, so it is unlikely that the observed differences were due to chance, and we can reject all three null hypotheses.

5.7 Citations

How do the number of citations compare amongst signers of letter A, B, and C? This is a trickier question, because many things influence how many citations a researcher has - age and field for instance - and the number of citations differ between Google Scholar, which includes preprints, and MathSciNet, which only includes published papers. We will subset accordingly, and run permutation tests on each to validate.

5.7.1 Math Sci Net citations

## [1] 424.6364

## [1] 1581.326

## [1] 2204.743

## [1] 299

## [1] 1025

## [1] 1392.5

Using Math Sci Net, the mean number of citations for signers of Letter A is 424.64, and the median is 299. The mean number of citations for signers of Letter B is 1581.33, and the median is 1025. The mean number of citations for signers of Letter C is 2204.74, and the median is 1392.5. So it seems by directly comparing populations, signers of Letter A had less citations than their counterparts on B and C. Let’s validate this using a permutation test.

muA <- mean(filter(df, (lettergroup == "A Only"|lettergroup == "A and B"))$amscit, na.rm = TRUE)
muB <- mean(filter(df, (lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B"))$amscit, na.rm = TRUE)
muC <- mean(filter(df, (lettergroup == "C Only"|lettergroup == "B and C"))$amscit, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC
set.seed(0)
dist <- meanPermutation(na.omit(df$amscit),10000)


hist(dist,
     main = "Permutation Test on MathSciNet citations",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0

## [1] 0.0036

## [1] 0

The probability that the difference in mean number of citations between signers of B and C is 0.36%. The difference between A and B and A and C are both outside the induced distribution. So it is unlikely that the observed difference in the number of MathSciNet citations was due to chance, and we may reject all three null hypotheses.

5.7.2 MathSciNet Citations Only Professors

## [1] 437.9062

## [1] 1571.051

## [1] 2176.642

## [1] 300

## [1] 971

## [1] 1353

The mean number of citations on Mathscinet for professors who were signers of Letter A is 437.91, and the median is 300. The mean number of citations for professors who were signers of Letter B is 1571.05, and the median is 971. The mean number of citations for professors who were signers of Letter C is 2176.64, and the median is 1353.

So it seems by directly comparing populations, signers of Letter A had less citations than their counterparts on B and C. Let’s validate this using a permutation test.

muA <- mean(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")))$amscit, na.rm = TRUE)
muB <- mean(filter(df, ((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")))$amscit, na.rm = TRUE)
muC <- mean(filter(df, ((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")))$amscit, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$amscit),10000)

hist(dist,
     main = "Permutation Test on MathSciNet citations only professors",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0

## [1] 0.0043

## [1] 0

The observed difference between A and B and A and C lie outside of the induced interval. The percentage of the induced distribution that is more extreme than the observed difference in mean number of citations per year between A and B is .43%.

5.7.3 MathSciNet Citations per Year

Older researchers have had a longer time to rack up citations, so it is important to normalize for this and divide by the length of time professionals have had their PhDs.

## [1] 16.88398

## [1] 43.584

## [1] 55.5451

## [1] 13.08333

## [1] 26.76667

## [1] 42.41667

The mean number of citations per year for signers of Letter A is 16.88, and the median is 13.08. The mean number of citations per year for signers of Letter B is 43.58, and the median is 26.77. The mean number of citations per year for signers of Letter C is 55.55, and the median is 42.42.

So it seems by directly comparing populations, professors on Letter A had less citations per year than their counterparts on B and C. Let’s validate this using a permutation test.

muA <- mean(filter(df, (lettergroup == "A Only"|lettergroup == "A and B"))$amscitperyear, na.rm = TRUE)
muB <- mean(filter(df, (lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B"))$amscitperyear, na.rm = TRUE)
muC <- mean(filter(df, (lettergroup == "C Only"|lettergroup == "B and C"))$amscitperyear, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$amscitperyear),10000)

hist(dist,
     main = "Citations per Year Math Sci Net",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0

## [1] 0.0479

## [1] 0

The percentage of the induced distribution that is more extreme than the observed difference in mean number of citations per year between A and B is 0%. The percentage of the induced distribution that is more extreme than the observed difference in mean number of citations per year between B and C is 4.79%.The observed difference in mean between A and C is outside the induced distribution. So it is unlikely that the observed difference in the number of Google Scholar citations was due to chance, and we may reject all three null hypotheses.

5.7.4 MathSciNet Citations per Year only professors

## [1] 16.88398

## [1] 44.99603

## [1] 55.3615

## [1] 13.08333

## [1] 27.475

## [1] 41.74468

The mean number of citations per year for professors who were signers of Letter A is 16.88, and the median is 13.08. The mean number of citations per year for professors who were signers of Letter B is 45.00, and the median is 27.48. The mean number of citations per year for professors who were signers of Letter C is 55.36, and the median is 41.74.

So it seems when comparing only professors, signers of Letter A had less citations than their counterparts on B and C. Let’s validate this using a permutation test.

muA <- mean(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")))$amscitperyear, na.rm = TRUE)
muB <- mean(filter(df, ((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")))$amscitperyear, na.rm = TRUE)
muC <- mean(filter(df, ((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")))$amscitperyear, na.rm = TRUE)

val1 <- muA-muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$amscitperyear),10000)

hist(dist,
     main = "AMS citations per year permutation test",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0

## [1] 0.0773

## [1] 0

7.73% of the induced distribution was more extreme than the observed difference between B and C. We fail to reject the difference null hypothesis between B and C. 0% of the induced distribution was more extreme than the observed difference between A and B between A and C, so it is unlikely those were due to chance. We may reject the remaining two null hypotheses.

5.7.5 AMS Citations Per Year - Female Professors Only

## [1] 15.30329

## [1] 25.34679

## [1] 35.04012

## [1] 5.545455

## [1] 17.49042

## [1] 22.70251

Using the data from mathscinet, the mean number of citations per year for female professors who were signers of Letter A is 15.30, and the median is 5.55. The mean number of citations per year for female professors who were signers of Letter B is 25.35, and the median is 17.49. The mean number of citations per year for female professors who were signers of Letter C is 35.04, and the median is 22.70.

So it seems when comparing only female professors, signers of Letter A had less citations than their counterparts on B and C.

Let’s validate this using a permutation test.

muA <- mean(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")&(gender=="woman")))$amscitperyear, na.rm = TRUE)
muB <- mean(filter(df, ((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")&(gender=="woman")))$amscitperyear, na.rm = TRUE)
muC <- mean(filter(df, ((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")&(gender=="woman")))$amscitperyear, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)


dist <- meanPermutation(na.omit(filter(df, ((gender=="woman")&(role=="professor")))$amscitperyear),10000)

hist(dist,
     main = "MathSciNet citperyear Female Professors",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0.1327

## [1] 0.1406

## [1] 0.0117

13.27% of the induced distribution was more extreme than the observed difference in mean between A and B. 14.06% of the induced distribution was more extreme than the observed difference in mean between B and C. 1.17% of the induced distribution was more extreme than the observed difference in ean between A and C. We fail to reject the difference in mean between female professors of Letter A and B and of Letter B and C, but we may reject the null hypothesis for the difference in citations per year for female signers of C and A.

5.8 hindex

An h-index is an alternative metric to citations to assess author impact. It attempts to balance the number of published papers and the number of citations.

## [1] 9.738462

## [1] 22.26523

## [1] 32.2375

## [1] 6

## [1] 19

## [1] 29.5

The mean number of h-index for signers of Letter A is 9.74, and the median is 6. The mean number of h-index for signers of Letter B is 22.27, and the median is 19. The mean h-index for signers of Letter C is 32.24, and the median is 29.5.

It appears that the signers of letter A had a lower h-index than signers of B and C. Let’s assess this using a permutation test.

muA <- mean(filter(df, (lettergroup == "A Only"|lettergroup == "A and B"))$hindex, na.rm = TRUE)
muB <- mean(filter(df, (lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B"))$hindex, na.rm = TRUE)
muC <- mean(filter(df, (lettergroup == "C Only"|lettergroup == "B and C"))$hindex, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

dist <- meanPermutation(na.omit(df$hindex,cols="hindex"),10000)

hist(dist,
     main = "hindex",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

All three differences in mean number of citations lie outside of the induced distribution, so it is unlikely that the observed differences were due to chance. We may hence reject the null hypothesis.

5.9 Control: Rutgers Math Department

As a control, we look at the Google Scholar citations, the AMS citations, respective citations per year, and h indices of the Rutger’s math department.

##      AMSCIT            phd       GoogleScholar      hindex    
##  Min.   : 120.0   Min.   :1955   Min.   : 367   Min.   :10.0  
##  1st Qu.: 566.5   1st Qu.:1976   1st Qu.:1150   1st Qu.:15.5  
##  Median : 846.0   Median :1985   Median :2630   Median :26.0  
##  Mean   :1297.2   Mean   :1984   Mean   :3393   Mean   :27.3  
##  3rd Qu.:1770.5   3rd Qu.:1992   3rd Qu.:4457   3rd Qu.:34.0  
##  Max.   :5219.0   Max.   :2005   Max.   :8452   Max.   :50.0  
##                                  NA's   :23     NA's   :23    
##       age          citperyear     amscitperyear    
##  Min.   :15.00   Min.   : 15.29   Min.   :  4.679  
##  1st Qu.:27.50   1st Qu.: 43.25   1st Qu.: 16.585  
##  Median :35.00   Median : 93.95   Median : 26.537  
##  Mean   :35.72   Mean   :101.75   Mean   : 35.007  
##  3rd Qu.:44.50   3rd Qu.:149.85   3rd Qu.: 48.162  
##  Max.   :65.00   Max.   :237.11   Max.   :136.286  
##                  NA's   :23

The mean age for Rutger’s faculty was 35.72 years. The mean h-index was 27.3. The mean Google Scholar citations was 3392.8. The mean AMS citations is 1297.19. The mean Google Scholar citations per year is 101.75. The mean AMS citations is 35.01.

This is in line with signers of letter B, greater than signers of letter A, and slightly less than that of signers of letter C.

5.10 Asian and Eastern European Born

We are interested in the proportion of letter signers who are Math Professors at R1 universities, who were born in Eastern European or Asian Countries. We created this data by looking at Wikipedia’s and personal knowledge.

Summary Statistics.

##       China     Croatia       Czech      Greece     Hungary        Iran 
##          18           1           1           1           4           1 
##       Japan       Korea Netherlands      Poland     Romania      Russia 
##           1           1           1           3           7          67 
##      Serbia      Turkey        NA's 
##           2           2         246

table(filter(df3, (Letter == "A Only"|Letter == "A and B"))$From)

## 
##       China     Croatia       Czech      Greece     Hungary        Iran 
##           0           0           0           0           0           0 
##       Japan       Korea Netherlands      Poland     Romania      Russia 
##           0           0           0           0           0           0 
##      Serbia      Turkey 
##           0           0

No signers of Letter A who were Math Professors at R1 universities, were from Eastern European or Asian Countries.

table(filter(df3, (Letter == "B and C"|Letter == "B Only"|Letter == "A and B"))$From)

## 
##       China     Croatia       Czech      Greece     Hungary        Iran 
##           6           1           1           1           0           1 
##       Japan       Korea Netherlands      Poland     Romania      Russia 
##           1           0           1           3           4          50 
##      Serbia      Turkey 
##           2           1

Looking at signers of Letter B who were Math Professors at R1 universities, we find there were 6 native Chinese, 1 native Croatian, 1 native Iranian, 1 native Japanese, 1 native Dutch, 3 native Poles, 3 native Romanians, 31 native Russians, 2 native Serbians, and 1 native Turk.

table(filter(df3, (Letter == "C Only"|Letter == "B and C"))$From)

## 
##       China     Croatia       Czech      Greece     Hungary        Iran 
##          12           0           1           1           4           0 
##       Japan       Korea Netherlands      Poland     Romania      Russia 
##           0           1           0           0           4          36 
##      Serbia      Turkey 
##           0           1

Looking at signers of Letter C who were Math Professors at R1 universities, we find there were 12 native Chinese, 1 native Czech, 1 native Greek, 4 native Hungarians, 1 native Korean, 4 native Romanians, 36 native Russians, and 1 native Turk.

5.11 AMS Fellows

How many signers were AMS Fellows?

table(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")
                  &(role=="professor")
                  &(field=="math")
                  &(institution=="domesticr1")))$fellows)

## 
## FALSE  TRUE 
##    32     6

9 signers of letter A were fellows of the AMS, 6 of whom were math professors at domesticr1 universities.

table(filter(df, (lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")
                  &(field=="math")
                  &(institution=="domesticr1"))$fellows)

## 
## FALSE  TRUE 
##   116    94

136 signers of letter B were fellows of the AMS, 94 of whom were math professors at domesticr1 universities.

table(filter(df, (lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")
                  &(field=="math")
                  &(institution=="domesticr1"))$fellows)

## 
## FALSE  TRUE 
##    56    81

103 signers of letter C were fellows of the AMS, 81 of whom were math professors at domesticr1 universities.

6 Conclusion and Discussion

We see the following patterns amongst the ``established’’ mathematicians who signed the three letters: the citations numbers distribution of the signers of Letter A is similar to that of a mid-level mathematics department (such as, say, Temple University), the citations metrics of Letter B are closer to that of a top 20 department such as Rutgers University, while the citations metrics of the signers of Letter C are another tier higher, and are more akin to the distribution of metrics for a truly top department. These differences persist if we restrict our study to female mathematicians only.

What explains these results? The difference in tone between the three letters can be summarized as follows: Letter A is the least meritocratic, suggesting the value of using political factors in preference to mathematical merit. Letter B takes no position on the meritocracy (though it is clear that it is opposed to the greater politicization of academic life), and Letter C is pro-meritocracy and diversity. It is then not surprising that people with the most “merit” in the judgement of the community support the most meritocratic letters, so the cynical analysis is simply that people are, as they say in finance, ``talking their book’’ - advocating policies which are most advantageous to them and people like them. However, it is not quite so simple.

Notice that the mathematicians from Communist countries (Eastern Europe and China) are considerably overrepresented amongst the signers of Letters B and C, and are not represented at all amongst the signers of Letter A. The reason for this is that it was very common for talented people in those country to go into Mathematics because it was the least political of the sciences (no expensive equipment or large teams of research are generally needed). All of those countries had severe discrimination against some ethnic and social groups in favor of supposedly disadvantaged groups - many of the signers were so talented that they could overcome the discrimination, but they generally had many friends who were only a little less spectacular who had to settle for working outside mathematics (or doing mathematics as a hobby). As a result, the very thought of bringing more exogenous (as opposed to the usual ``academic’’, and unavoidable) politics into the field fills them with revulsion.

It is thus not correct, as done by Topaz et al,[8] to view letters B and C as the voice of the ``power elite’’ - the people described above had generally arrived to the US with nothing but the shirts on their backs and got ahead through sheer talent and hard work.

7 Bibliography

[1] https://www.ams.org/journals/notices/201911/rnoti-p1778.pdf

[2] https://www.ams.org/journals/notices/202001/rnoti-o1.pdf

[3] https://qsideinstitute.org/2019/11/19/diversity-statements-in-hiring-the-american-mathematical-society-and-uc-davis/

[4] https://pypi.org/project/scholarly/

[5] https://scholar.google.com/

[6] https://genealogy.math.ndsu.nodak.edu/

[7] https://github.com/j2kun/math-genealogy-scraper

[8] https://qsideinstitute.org/download/ams-letters-study/

[9] http://www.ams.org/profession/data/annual-survey/2016dp-tableDF1.pdf?fbclid=IwAR1mgI0qSEs5nCGquqye741_0lZU-ez7dlcJ3wZYhDtJUswhH1SX7yeiiak

8 Appendix

8.1 Data and Code

All Data and Code is available at https://github.com/joshp112358/Notices

8.2 Google Scholar Citations

## [1] 947.7333

## [1] 3482.756

## [1] 6074.037

## [1] 161

## [1] 1493

## [1] 3307

The mean number of citations for signers of Letter A is 947.73, and the median is 161. The mean number of citations for signers of Letter B is 3482.76, and the median is 1493. The mean number of citations for signers of Letter C is 6074.04, and the median is 3307.

So it seems by directly comparing populations, signers of Letter A had less citations than their counterparts on B and C, and that signers of B had less citations than those on C.

Let’s validate this using a permutation test.

muA <- mean(filter(df, (lettergroup == "A Only"|lettergroup == "A and B"))$citations, na.rm = TRUE)
muB <- mean(filter(df, (lettergroup == "B Only"|lettergroup == "A and B"))$citations, na.rm = TRUE)
muC <- mean(filter(df, (lettergroup == "C Only"|lettergroup == "B and C"))$citations, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$citations),10000)

hist(dist,
     main = "Permutation Test on GS citations",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

All three differences in mean number of citations lie outside of the induced distribution, so it is unlikely that the observed differences were due to chance. So we can reject all three null hypotheses, and deduce that the signers of letter A had less citations than signers of B and C.

8.3 Google Scholar Citations Only Professors

It is difficult to compare grad students or recently graduates to professors. So we should subset the data, and reperform the analysis comparing only professors.

The mean number of citations for professors who were signers of Letter A is 2397.75, and the median is 954. The mean number of citations for professors who were signers of Letter B is 4136.432, and the median is 1923. The mean number of citations for professors who were signers of Letter C is 6226.816, and the median is 3307.

So it seems by directly comparing populations, professors on Letter A had less citations than their counterparts on B and C. Let’s validate this using a permutation test.

muA <- mean(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")))$citations, na.rm = TRUE)
muB <- mean(filter(df, ((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")))$citations, na.rm = TRUE)
muC <- mean(filter(df, ((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")))$citations, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$citations),10000)

hist(dist,
     main = "Permutation Test on GS citations only professors",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0

## [1] 2e-04

## [1] 0

The probability that the difference in mean number of citations between signers, comparing only professors, of B and C is 0.02%. The difference between A and B and A and C are both outside the induced distribution. So it is unlikely that the observed difference in the number of Google Scholar citations was due to chance, and we may reject all three null hypotheses.

8.4 Google Scholar Citations per Year

## [1] 69.2537

## [1] 124.6823

## [1] 176.9885

## [1] 22.9011

## [1] 48.72857

## [1] 116.6667

The mean number of citations per year for signers of Letter A is 69.25, and the median is 22.90. The mean number of citations per year for signers of Letter B is 124.68, and the median is 48.73. The mean number of citations per year for signers of Letter C is 176.99, and the median is 116.67.

So it seems by directly comparing populations, professors on Letter A had less citations per year than their counterparts on B and C. Let’s validate this using a permutation test.

muA <- mean(filter(df, (lettergroup == "A Only"|lettergroup == "A and B"))$citperyear, na.rm = TRUE)
muB <- mean(filter(df, (lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B"))$citperyear, na.rm = TRUE)
muC <- mean(filter(df, (lettergroup == "C Only"|lettergroup == "B and C"))$citperyear, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$citperyear),10000)

hist(dist,
     main = "Citations per Year Google Scholar",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0.0186

## [1] 0.0264

## [1] 0

The percentage of the induced distribution that is more extreme than the observed difference in mean number of citations per year between A and B is 1.86%. The percentage of the induced distribution that is more extreme than the observed difference in mean number of citations per year between B and C is 2.64%.The observed difference in mean between A and C is outside the induced distribution. So it is unlikely that the observed difference in the number of Google Scholar citations was due to chance, and we may reject all three null hypotheses.

8.5 Google Scholar Citations per Year only professors

## [1] 139.2636

## [1] 142.3937

## [1] 182.7848

## [1] 47.56066

## [1] 74.8125

## [1] 111.7333

The mean number of citations per year for professors who were signers of Letter A is 139.26, and the median is 47.56. The mean number of citations per year for professors who were signers of Letter B is 142.39, and the median is 74.81. The mean number of citations per year for professors who were signers of Letter C is 182.78, and the median is 111.73.

So it seems by directly comparing populations, signers of Letter A had more citations than their counterparts on B, though less than their counterparts C.

Let’s validate this using a permutation test.

muA <- mean(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")))$citperyear, na.rm = TRUE)
muB <- mean(filter(df, ((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")))$citperyear, na.rm = TRUE)
muC <- mean(filter(df, ((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")))$citperyear, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)
dist <- meanPermutation(na.omit(df$citperyear),10000)

hist(dist,
     main = "GS citations per year Professors Only",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0.4626

## [1] 0.0778

## [1] 0.0606

About 46.26% of the induced distribution is more extreme (less) than the observed difference between A and B, so it is highly likely the observed difference was due to chance. We fail to reject the null hypothesis that mean of A is equal to B. Only 7.78% of the induced distribution was more extreme than the difference between B and C, and 6.06% between A and C, so it is unlikely those were due to chance. We can thus reject the null hypothesis that \(\mu(B)=\mu(C)\) and \(\mu(C)=\mu(A)\), but with less certainty than the observed difference for age.

8.6 Google Scholar Citations Per Year - Female Professors Only

## [1] 58.24863

## [1] 81.4338

## [1] 72.29003

## [1] 24.27778

## [1] 54.16667

## [1] 64.56522

The mean number of citations per year for female professors who were signers of Letter A is 58.25, and the median is 24.28. The mean number of citations per year for female professors who were signers of Letter B is 81.43, and the median is 54.17. The mean number of citations per year for female professors who were signers of Letter C is 72.29, and the median is 64.56.

So it seems when comparing only female professors, signers of Letter A had less citations per year than their counterparts on B and C, while signers of B had more citations per year than C.

Let’s validate this using a permutation test.

muA <- mean(filter(df, ((lettergroup == "A Only"|lettergroup == "A and B")&(role=="professor")&(gender=="woman")))$citperyear, na.rm = TRUE)
muB <- mean(filter(df, ((lettergroup == "B and C"|lettergroup == "B Only"|lettergroup == "A and B")&(role=="professor")&(gender=="woman")))$citperyear, na.rm = TRUE)
muC <- mean(filter(df, ((lettergroup == "C Only"|lettergroup == "B and C")&(role=="professor")&(gender=="woman")))$citperyear, na.rm = TRUE)

val1 <- muA - muB
val2 <- muB - muC
val3 <- muA - muC

set.seed(0)


dist <- meanPermutation(na.omit(filter(df, ((gender=="woman")&(role=="professor")))$citperyear),10000)

hist(dist,
     main = "GS citperyear Female Professors",
     xlab = "Differences in Mean")
abline(v=val1, col = "red")
abline(v=val2, col = "blue")
abline(v=val3, col = "green")

## [1] 0.1938

## [1] 0.6239

## [1] 0.3156

19.38% of the induced distribution was more extreme than the observed difference in mean between A and B. 62.39% of the induced distribution was more extreme than the observed difference in mean between B and C. 31.56% of the induced distribution was more extreme than the observed difference in ean between A and C. So all of these had a high likelihood of being produced by chance, and we fail to reject all three null hypotheses.

Citations Analysis of Letter Signers

Joshua Paik and Igor Rivin

1 Abstract

2 Introduction

3 Data Collection

4 Summary Statistics

5 Exploratory Data Analysis

5.1 NaN Visualization

5.2 Distribution of Google Scholar Citations

5.3 Distribution of MathSciNet citations

5.4 Permutation Tests

5.5 Gender

5.6 Age

5.7 Citations

5.7.1 Math Sci Net citations

5.7.2 MathSciNet Citations Only Professors

5.7.3 MathSciNet Citations per Year

5.7.4 MathSciNet Citations per Year only professors

5.7.5 AMS Citations Per Year - Female Professors Only

5.8 hindex

5.9 Control: Rutgers Math Department

5.10 Asian and Eastern European Born

5.11 AMS Fellows

6 Conclusion and Discussion

7 Bibliography

8 Appendix

8.1 Data and Code

8.2 Google Scholar Citations

8.3 Google Scholar Citations Only Professors

8.4 Google Scholar Citations per Year

8.5 Google Scholar Citations per Year only professors

8.6 Google Scholar Citations Per Year - Female Professors Only