What?

Does the game of science reward people who employ more reasonable sample sizes, i.e. with citations etc.? Let’s have a look.

Data

Data concern 71 prominent social psychologists. Data are from the Replicability-Index blog run by Ulrich Schimmack (https://replicationindex.wordpress.com/2018/11/08/replicability-rankings-of-eminent-social-psychologists/). They came with the usual kind of caveats for this kind of thing:

The results should not be overinterpreted. They are estimates based on an objective statistical procedure, but no statistical method can compensate perfectly for the various practices that led to the observed distribution of p-values (transformed into z-scores). However, in the absence of any information which results can be trusted, these graphs provide some information. How this information is used by consumers depends ultimately on consumers’ subjective beliefs. Information about the average replicability of researchers’ published results may influence these beliefs.

Replication index is defined as:

R-Index = Median Observed Power – Inflation

Inflation = Percentage of Significant Results – Median (Estimated Power)

See details in https://replicationindex.wordpress.com/2016/01/31/a-revised-introduction-to-the-r-index/.

Init

options(digits = 2)
library(pacman)
p_load(kirkegaard, googlesheets, rms)

gs_auth()
gs = gs_url("https://docs.google.com/spreadsheets/d/17qaOuR1AHg65torM9hwQ49jhH-9M74DE4bU-kBie8Ig/edit#gid=0")
## Sheet-identifying info appears to be a browser URL.
## googlesheets will attempt to extract sheet key from the URL.
## Putative key: 17qaOuR1AHg65torM9hwQ49jhH-9M74DE4bU-kBie8Ig
## Sheet successfully identified: "Replicability Rankings of Eminent Social Psychologists"
d = gs_read(gs) %>% df_legalize_names()
## Accessing worksheet titled 'data'.
## Parsed with column specification:
## cols(
##   rank = col_integer(),
##   researcher = col_character(),
##   `R-Index (all)` = col_integer(),
##   `R-Index (2.0-2.5)` = col_integer(),
##   `R-Index (2.5-3.0)` = col_integer(),
##   `#P-vals` = col_integer(),
##   `H-Index` = col_integer(),
##   publications = col_integer(),
##   citations_thousands = col_integer()
## )
d$citations_thousands_pp = d$citations_thousands / d$publications

Analyses

#cors
wtd.cors(d[-c(1:2)])
##                        R_Index_all R_Index_2_0_2_5 R_Index_2_5_3_0  P_vals
## R_Index_all                  1.000           0.620           0.859  0.0931
## R_Index_2_0_2_5              0.620           1.000           0.757  0.0212
## R_Index_2_5_3_0              0.859           0.757           1.000  0.0428
## P_vals                       0.093           0.021           0.043  1.0000
## H_Index                      0.204           0.058           0.119  0.0046
## publications                 0.228           0.097           0.212  0.0360
## citations_thousands          0.168           0.087           0.150 -0.0779
## citations_thousands_pp      -0.062           0.105           0.013 -0.2261
##                        H_Index publications citations_thousands
## R_Index_all             0.2041        0.228               0.168
## R_Index_2_0_2_5         0.0581        0.097               0.087
## R_Index_2_5_3_0         0.1192        0.212               0.150
## P_vals                  0.0046        0.036              -0.078
## H_Index                 1.0000        0.793               0.875
## publications            0.7930        1.000               0.655
## citations_thousands     0.8746        0.655               1.000
## citations_thousands_pp  0.0912       -0.305               0.438
##                        citations_thousands_pp
## R_Index_all                            -0.062
## R_Index_2_0_2_5                         0.105
## R_Index_2_5_3_0                         0.013
## P_vals                                 -0.226
## H_Index                                 0.091
## publications                           -0.305
## citations_thousands                     0.438
## citations_thousands_pp                  1.000
#plot
GG_scatter(d, "R_Index_all", "H_Index", case_names = "researcher")

GG_scatter(d, "R_Index_all", "citations_thousands", case_names = "researcher")

GG_scatter(d, "R_Index_all", "citations_thousands_pp", case_names = "researcher")

#models
ols(H_Index ~ R_Index_all + publications + P_vals, data = d)
## Linear Regression Model
##  
##  ols(formula = H_Index ~ R_Index_all + publications + P_vals, 
##      data = d)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs      71    LR chi2     70.59    R2       0.630    
##  sigma8.6979    d.f.            3    R2 adj   0.613    
##  d.f.     67    Pr(> chi2) 0.0000    g       11.506    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -22.3040  -4.8888  -0.1642   5.1334  20.2419 
##  
##  
##               Coef    S.E.   t     Pr(>|t|)
##  Intercept    25.5640 7.2548  3.52 0.0008  
##  R_Index_all   0.0413 0.1193  0.35 0.7302  
##  publications  0.1492 0.0145 10.32 <0.0001 
##  P_vals       -0.0007 0.0019 -0.35 0.7266  
## 
ols(citations_thousands ~ R_Index_all + publications + P_vals, data = d)
## Linear Regression Model
##  
##  ols(formula = citations_thousands ~ R_Index_all + publications + 
##      P_vals, data = d)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs      71    LR chi2     41.21    R2       0.440    
##  sigma5.9395    d.f.            3    R2 adj   0.415    
##  d.f.     67    Pr(> chi2) 0.0000    g        5.321    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -12.4689  -3.5479  -0.3465   3.7154  19.0317 
##  
##  
##               Coef    S.E.   t     Pr(>|t|)
##  Intercept     2.0144 4.9541  0.41 0.6856  
##  R_Index_all   0.0252 0.0815  0.31 0.7584  
##  publications  0.0686 0.0099  6.95 <0.0001 
##  P_vals       -0.0015 0.0013 -1.13 0.2610  
## 
ols(citations_thousands_pp ~ R_Index_all + publications + P_vals, data = d)
## Linear Regression Model
##  
##  ols(formula = citations_thousands_pp ~ R_Index_all + publications + 
##      P_vals, data = d)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs      71    LR chi2     10.70    R2       0.140    
##  sigma0.0406    d.f.            3    R2 adj   0.101    
##  d.f.     67    Pr(> chi2) 0.0135    g        0.018    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -0.07334 -0.02944 -0.00773  0.02088  0.15232 
##  
##  
##               Coef    S.E.   t     Pr(>|t|)
##  Intercept     0.1217 0.0339  3.59 0.0006  
##  R_Index_all   0.0001 0.0006  0.24 0.8128  
##  publications -0.0002 0.0001 -2.60 0.0113  
##  P_vals        0.0000 0.0000 -1.91 0.0600  
## 

Thus, we see that replication index has only weak bivariate relations to scientist success metrics, and no detectable relationship in a regression model with covariates. The game of science does not appear to reward scientists for higher quality scientific work by these metrics.

#inspect data
d