Does the game of science reward people who employ more reasonable sample sizes, i.e. with citations etc.? Let’s have a look.
Data concern 71 prominent social psychologists. Data are from the Replicability-Index blog run by Ulrich Schimmack (https://replicationindex.wordpress.com/2018/11/08/replicability-rankings-of-eminent-social-psychologists/). They came with the usual kind of caveats for this kind of thing:
The results should not be overinterpreted. They are estimates based on an objective statistical procedure, but no statistical method can compensate perfectly for the various practices that led to the observed distribution of p-values (transformed into z-scores). However, in the absence of any information which results can be trusted, these graphs provide some information. How this information is used by consumers depends ultimately on consumers’ subjective beliefs. Information about the average replicability of researchers’ published results may influence these beliefs.
Replication index is defined as:
R-Index = Median Observed Power – Inflation
Inflation = Percentage of Significant Results – Median (Estimated Power)
See details in https://replicationindex.wordpress.com/2016/01/31/a-revised-introduction-to-the-r-index/.
options(digits = 2)
library(pacman)
p_load(kirkegaard, googlesheets, rms)
gs_auth()
gs = gs_url("https://docs.google.com/spreadsheets/d/17qaOuR1AHg65torM9hwQ49jhH-9M74DE4bU-kBie8Ig/edit#gid=0")
## Sheet-identifying info appears to be a browser URL.
## googlesheets will attempt to extract sheet key from the URL.
## Putative key: 17qaOuR1AHg65torM9hwQ49jhH-9M74DE4bU-kBie8Ig
## Sheet successfully identified: "Replicability Rankings of Eminent Social Psychologists"
d = gs_read(gs) %>% df_legalize_names()
## Accessing worksheet titled 'data'.
## Parsed with column specification:
## cols(
## rank = col_integer(),
## researcher = col_character(),
## `R-Index (all)` = col_integer(),
## `R-Index (2.0-2.5)` = col_integer(),
## `R-Index (2.5-3.0)` = col_integer(),
## `#P-vals` = col_integer(),
## `H-Index` = col_integer(),
## publications = col_integer(),
## citations_thousands = col_integer()
## )
d$citations_thousands_pp = d$citations_thousands / d$publications
#cors
wtd.cors(d[-c(1:2)])
## R_Index_all R_Index_2_0_2_5 R_Index_2_5_3_0 P_vals
## R_Index_all 1.000 0.620 0.859 0.0931
## R_Index_2_0_2_5 0.620 1.000 0.757 0.0212
## R_Index_2_5_3_0 0.859 0.757 1.000 0.0428
## P_vals 0.093 0.021 0.043 1.0000
## H_Index 0.204 0.058 0.119 0.0046
## publications 0.228 0.097 0.212 0.0360
## citations_thousands 0.168 0.087 0.150 -0.0779
## citations_thousands_pp -0.062 0.105 0.013 -0.2261
## H_Index publications citations_thousands
## R_Index_all 0.2041 0.228 0.168
## R_Index_2_0_2_5 0.0581 0.097 0.087
## R_Index_2_5_3_0 0.1192 0.212 0.150
## P_vals 0.0046 0.036 -0.078
## H_Index 1.0000 0.793 0.875
## publications 0.7930 1.000 0.655
## citations_thousands 0.8746 0.655 1.000
## citations_thousands_pp 0.0912 -0.305 0.438
## citations_thousands_pp
## R_Index_all -0.062
## R_Index_2_0_2_5 0.105
## R_Index_2_5_3_0 0.013
## P_vals -0.226
## H_Index 0.091
## publications -0.305
## citations_thousands 0.438
## citations_thousands_pp 1.000
#plot
GG_scatter(d, "R_Index_all", "H_Index", case_names = "researcher")
GG_scatter(d, "R_Index_all", "citations_thousands", case_names = "researcher")
GG_scatter(d, "R_Index_all", "citations_thousands_pp", case_names = "researcher")
#models
ols(H_Index ~ R_Index_all + publications + P_vals, data = d)
## Linear Regression Model
##
## ols(formula = H_Index ~ R_Index_all + publications + P_vals,
## data = d)
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 71 LR chi2 70.59 R2 0.630
## sigma8.6979 d.f. 3 R2 adj 0.613
## d.f. 67 Pr(> chi2) 0.0000 g 11.506
##
## Residuals
##
## Min 1Q Median 3Q Max
## -22.3040 -4.8888 -0.1642 5.1334 20.2419
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 25.5640 7.2548 3.52 0.0008
## R_Index_all 0.0413 0.1193 0.35 0.7302
## publications 0.1492 0.0145 10.32 <0.0001
## P_vals -0.0007 0.0019 -0.35 0.7266
##
ols(citations_thousands ~ R_Index_all + publications + P_vals, data = d)
## Linear Regression Model
##
## ols(formula = citations_thousands ~ R_Index_all + publications +
## P_vals, data = d)
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 71 LR chi2 41.21 R2 0.440
## sigma5.9395 d.f. 3 R2 adj 0.415
## d.f. 67 Pr(> chi2) 0.0000 g 5.321
##
## Residuals
##
## Min 1Q Median 3Q Max
## -12.4689 -3.5479 -0.3465 3.7154 19.0317
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 2.0144 4.9541 0.41 0.6856
## R_Index_all 0.0252 0.0815 0.31 0.7584
## publications 0.0686 0.0099 6.95 <0.0001
## P_vals -0.0015 0.0013 -1.13 0.2610
##
ols(citations_thousands_pp ~ R_Index_all + publications + P_vals, data = d)
## Linear Regression Model
##
## ols(formula = citations_thousands_pp ~ R_Index_all + publications +
## P_vals, data = d)
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 71 LR chi2 10.70 R2 0.140
## sigma0.0406 d.f. 3 R2 adj 0.101
## d.f. 67 Pr(> chi2) 0.0135 g 0.018
##
## Residuals
##
## Min 1Q Median 3Q Max
## -0.07334 -0.02944 -0.00773 0.02088 0.15232
##
##
## Coef S.E. t Pr(>|t|)
## Intercept 0.1217 0.0339 3.59 0.0006
## R_Index_all 0.0001 0.0006 0.24 0.8128
## publications -0.0002 0.0001 -2.60 0.0113
## P_vals 0.0000 0.0000 -1.91 0.0600
##
Thus, we see that replication index has only weak bivariate relations to scientist success metrics, and no detectable relationship in a regression model with covariates. The game of science does not appear to reward scientists for higher quality scientific work by these metrics.
#inspect data
d