Pull data

Download

Read in cache.

Parsing

We parse out values from the raw stats.

what didn’t parse

Check that nothing that has a stat input and doesn’t get an ES out.

## # A tibble: 7 × 3
## # Rowwise: 
##   target_lastauthor_year type     raw_stat                     
##   <chr>                  <fct>    <chr>                        
## 1 chou2016               rescue   t(240.14)=.50322TODOrederive!
## 2 craig2014              rescue   TODOIWANTRAWNUMBERS          
## 3 gong2019               rescue   F(1,113)=6.2934750           
## 4 schechtman2010         original <NA>                         
## 5 schechtman2010         rep1     t(1420)=-2.384               
## 6 schechtman2010         rescue   F(1,1508)=1.429              
## 7 yeshurun2003           rescue   <NA>
## # A tibble: 4 × 3
## # Rowwise: 
##   target_lastauthor_year type     raw_stat                     
##   <chr>                  <fct>    <chr>                        
## 1 chou2016               rescue   t(240.14)=.50322TODOrederive!
## 2 craig2014              rescue   TODOIWANTRAWNUMBERS          
## 3 schechtman2010         original <NA>                         
## 4 yeshurun2003           rescue   <NA>

Draft plot of effect sizes

Subjective

## # A tibble: 4 × 2
##   replication_score     n
##               <dbl> <int>
## 1              0       11
## 2              0.75     2
## 3              1        3
## 4             NA        1
##       rho 
## 0.9032829

Of a total of 17 replication, 5 succeeding at mostly or fully replicating the original results (11 with a score of 0, 2 with a score of .75, and 3 with a score of 1). The interrater reliability was 0.903.

We correlate all the things with subjective replication success

Predictors r p
Social 0.110 0.686
Other psych -0.320 0.228
Within subjects 0.299 0.261
Single vignette -0.065 0.810
Switch to online 0.140 0.606
Open data 0.213 0.427
Open materials 0.449 0.081
Stanford -0.251 0.347
Log trials 0.065 0.810
Log original sample size -0.060 0.825
Log rep/orig sample 0.046 0.865
rep_1_log_sample -0.374 0.153
log_ratio_rep1_orig -0.469 0.067
log_ratio_rescue_rep1 0.495 0.051
  • If there is a mixture of projects that succeed and fail to replicate the original results, we will qualitatively describe differences that may have played a role.

It looks like the ones with poor replication sample (due to inflated effect size, or exclusion/attrition/etc issues) where the rescue recruited more is the only sorta strong predictor. (I added a couple non-pre-reg’d relative sample size of rep1 measures)

table of expts by sample sizes & closenesses

## # A tibble: 17 × 7
##    paper rescue_score N_original N_rep1 N_rescue closeness_rep1 closeness_rescue
##    <chr>        <dbl>      <dbl>  <dbl>    <dbl> <chr>          <chr>           
##  1 krau…         1           101     19       75 close          very close      
##  2 ngo2…         1            31     12       77 very close     very close      
##  3 todd…         1            63     26       55 very close     very close      
##  4 jara…         0.75        144    147      426 exact          exact           
##  5 port…         0.75        145    168      136 close          very close      
##  6 birc…         0           103     73      247 very close     very close      
##  7 chil…         0            35     40       98 very close     very close      
##  8 chou…         0           100    158      252 close          very close      
##  9 crai…         0           121     76      127 exact          exact           
## 10 gong…         0           155     90      137 far            far             
## 11 haim…         0           132     97      141 exact          exact           
## 12 hopk…         0           147     93      161 very close     very close      
## 13 paxt…         0            92     82      160 close          close           
## 14 payn…         0            48     23       23 far            very close      
## 15 sche…         0            39     20       21 close          close           
## 16 tara…         0           139    212      166 close          close           
## 17 yesh…        NA            18     10       NA close          <NA>

PredInt and P_orig

NOTE: can’t do p-orig not on SMD scale here b/c tau imputation is in SMD units!!!

orig v all on p-orig

  • We will use p-original to evaluate how consistent the original effect size is with the totality of replications. We expect there to be a small number of replications, so we will impute the heterogeneity value as in https://osf.io/preprints/psyarxiv/dpyn6/.

secondary, original and rescue

  1. p-original between just the original and rescue,

original v !rescue

  1. p-original between the original and all replications except the rescue (in the case where no replications are found in the literature, this is the same as done in https://osf.io/preprints/psyarxiv/dpyn6/)

rescue v reps

  1. p-original between the rescue and all other replications.

show p_origs

## # A tibble: 17 × 5
## # Rowwise:  target_lastauthor_year
##    target_lastauthor_year orig_v_other orig_v_rescue orig_v_not_rescue
##    <chr>                         <dbl>         <dbl>             <dbl>
##  1 birch2007                  0.192      0.194                 1.91e-1
##  2 child2018                  0.405      0.641                 3.77e-9
##  3 chou2016                  NA         NA                     4.91e-3
##  4 craig2014                 NA         NA                     3.02e-1
##  5 gong2019                  NA         NA                     5.11e-1
##  6 haimovitz2016              0.0687     0.0182                1.80e-1
##  7 hopkins2016               NA         NA                    NA      
##  8 jara-ettinger2022         NA         NA                    NA      
##  9 krauss2003                NA         NA                    NA      
## 10 ngo2019                    0.106      0.00000903            3.85e-1
## 11 paxton2012                 0.0109     0.00545               2.26e-2
## 12 payne2008                  0.0714     0.0308                2.45e-1
## 13 porter2016                 0.370      0.905                 2.30e-3
## 14 schechtman2010            NA         NA                    NA      
## 15 tarampi2016                0.000341   0.000000291           2.24e-4
## 16 todd2016                   0.0619     0.124                 5.81e-2
## 17 yeshurun2003              NA         NA                     1.60e-2
## # ℹ 1 more variable: rescue_v_reps <dbl>

some predint viz! (this is currently with no hetereogeneity, idk what we actually want)

We will visualize the consistency between original, 1st replication, rescue, and any other replications by plotting effect size and prediction interval for each.

viz attempts

predInt with heterogeneity

viz attempts