Pull data

Download

Read in cache.

Counts.

## # A tibble: 1 × 1
##       n
##   <int>
## 1   210
## # A tibble: 8 × 2
##   status                              n
##   <chr>                           <int>
## 1 ASK                                12
## 2 cannot tell direction in rep        4
## 3 cannot tell direction in target     1
## 4 done                              159
## 5 missing                             2
## 6 non-experiment                     12
## 7 reproduction                       19
## 8 unusable                            1
## # A tibble: 1 × 1
##       n
##   <int>
## 1   177
## # A tibble: 4 × 2
##   include      n
##   <chr>    <int>
## 1 exp         45
## 2 no          34
## 3 pred_int    24
## 4 stats      107

We have maybe as many as 177 at least for some analyses.

Descriptions:

Parsing

We parse out values from the raw stats.

what didn’t parse

Check that nothing that has a stat input and doesn’t get an ES out.

## # A tibble: 0 × 2
## # Rowwise: 
## # … with 2 variables: target_lastauthor_year <chr>, target_raw_stat <chr>
## # A tibble: 0 × 2
## # Rowwise: 
## # … with 2 variables: target_lastauthor_year <chr>, replication_raw_stat <chr>

PredInt and P_orig

Compute prediction intervals and p_orig.

viz SMD

First pass plots.

Working more on the visualization with subjective rep status.

How much missing data?

Note, it may seem weird that we’re missing d and SE for many more replications than we are p values. This is because we can’t get d_calc if we don’t have a filled in value for same direction (but we have p value and unsigned d_calc).

## total rows
## [1] 177
## number of rows for subjective w/ demographic/experimental
## expected
## [1] 176
## actual
## [1] 176
## number of rows for predInt/p_orig w/ demographic/experimental
## expected
## [1] 131
## actual
## [1] 131
## number of complete rows for full analysis
## expected
## [1] 107
## actual
## [1] 107

code vars for models

##    include          target_lastauthor_year academic_year     
##  Length:177         Length:177             Length:177        
##  Class :character   Class :character       Class :character  
##  Mode  :character   Mode  :character       Mode  :character  
##                                                              
##                                                              
##                                                              
##                                                              
##    subfield            pub_year           log_p             log_sample    
##  Length:177         Min.   :-48.525   Min.   :-295.9909   Min.   :0.6931  
##  Class :character   1st Qu.: -2.525   1st Qu.: -16.7525   1st Qu.:3.6889  
##  Mode  :character   Median :  1.475   Median :  -6.0717   Median :4.6151  
##                     Mean   :  0.000   Mean   : -16.1629   Mean   :4.5020  
##                     3rd Qu.:  3.475   3rd Qu.:  -3.5666   3rd Qu.:5.1957  
##                     Max.   :  8.475   Max.   :   0.6932   Max.   :7.4955  
##                                       NA's   :57          NA's   :1       
##   log_ratio_ss      change_platform  target_d_calc        stanford      
##  Min.   :-3.50656   Min.   :0.0000   Min.   :0.02236   Min.   :0.00000  
##  1st Qu.:-0.88504   1st Qu.:0.0000   1st Qu.:0.45040   1st Qu.:0.00000  
##  Median :-0.15149   Median :1.0000   Median :0.60249   Median :0.00000  
##  Mean   :-0.41790   Mean   :0.5311   Mean   :0.82353   Mean   :0.09605  
##  3rd Qu.: 0.05205   3rd Qu.:1.0000   3rd Qu.:0.95758   3rd Qu.:0.00000  
##  Max.   : 3.67313   Max.   :1.0000   Max.   :7.86738   Max.   :1.00000  
##  NA's   :1                           NA's   :56                         
##    open_data         open_mat        is_within     single_vignette 
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.000   Median :0.0000  
##  Mean   :0.2938   Mean   :0.4689   Mean   :0.452   Mean   :0.4407  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000   Max.   :1.0000  
##                                                                    
##    log_trials       predInt           p_orig           sub_rep      
##  Min.   :0.000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :1.609   Median :0.0000   Median :0.03315   Median :0.5000  
##  Mean   :2.163   Mean   :0.4511   Mean   :0.21397   Mean   :0.4901  
##  3rd Qu.:4.094   3rd Qu.:1.0000   3rd Qu.:0.36853   3rd Qu.:1.0000  
##  Max.   :8.230   Max.   :1.0000   Max.   :0.99971   Max.   :1.0000  
##                  NA's   :44       NA's   :44        NA's   :1

z-score

check z-score

tier 1 data

tier 2 data

tier 3 data

Pre-reg’d Models

Note: tier 2 and 3 models are subject to change as a few more studies might in fact have useable stats & some of the effect size extraction might be incorrect.

Sensitivity analysis TODO

Exploratory working

correlation

including tier 3 predictors

Frequentist Lassoes

regularized tier 1 sub

regularized p_orig

Tier 3

Correlations

preds r p
z_pub_year 0.064 0.399
open_data 0.150 0.047
open_mat 0.002 0.979
stanford -0.027 0.725
change_platform -0.158 0.037
z_log_ratio_ss -0.047 0.536
is_within 0.333 0.000
single_vignette -0.267 0.000
z_log_sample -0.108 0.155
z_log_trials 0.182 0.015

Individual predictor - outcome correlations

Sub_rep

Pred_int

Note: predInt may not be reliably calculated in some cases. Dealing with numbers is hard!

P_orig

Note: predInt may not be reliably calculated in some cases. Dealing with numbers is hard!

# Exploratory even randomer

Model exploration

## 
## Call:
## lm(formula = sub_rep ~ z_pub_year + subfield + open_data + open_mat + 
##     stanford + change_platform + z_log_ratio_ss + is_within + 
##     single_vignette + z_log_sample + z_log_trials, data = data_tier1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7511 -1.3244 -0.2346  1.3409  3.4323 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.218252   0.407093   7.905 3.89e-13 ***
## z_pub_year           0.094443   0.143961   0.656  0.51274    
## subfieldnon-psych   -0.094368   0.455422  -0.207  0.83611    
## subfieldother-psych  0.005194   0.403553   0.013  0.98975    
## subfieldsocial      -0.600773   0.345548  -1.739  0.08400 .  
## open_data            0.472727   0.357222   1.323  0.18759    
## open_mat            -0.508718   0.327028  -1.556  0.12176    
## stanford            -0.008220   0.442056  -0.019  0.98519    
## change_platform     -0.589066   0.288515  -2.042  0.04280 *  
## z_log_ratio_ss      -0.117647   0.157094  -0.749  0.45501    
## is_within            1.218763   0.381555   3.194  0.00169 ** 
## single_vignette     -0.328094   0.483279  -0.679  0.49818    
## z_log_sample        -0.091768   0.208241  -0.441  0.66003    
## z_log_trials        -0.350492   0.260318  -1.346  0.18005    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.633 on 162 degrees of freedom
## Multiple R-squared:  0.2007, Adjusted R-squared:  0.1365 
## F-statistic: 3.128 on 13 and 162 DF,  p-value: 0.0003479

Repeating regularized

## 11 x 1 sparse Matrix of class "dgCMatrix"
##                         s1
## (Intercept)      2.8428523
## z_pub_year       .        
## open_data        0.1085095
## open_mat         .        
## stanford         .        
## change_platform -0.4669277
## z_log_ratio_ss   .        
## is_within        0.9137496
## single_vignette -0.1819649
## z_log_sample     .        
## z_log_trials     .
## 11 x 1 sparse Matrix of class "dgCMatrix"
##                          s1
## (Intercept)      2.72149889
## z_pub_year       .         
## open_data        .         
## open_mat         .         
## stanford         .         
## change_platform -0.08774184
## z_log_ratio_ss   .         
## is_within        0.62829911
## single_vignette  .         
## z_log_sample     .         
## z_log_trials     .

Principal components

Principal components are not very interpretable.

## Importance of components:
##                          Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     1.504085 1.0247345 0.9327507 0.55806625 0.52678095
## Proportion of Variance 0.429554 0.1993864 0.1651977 0.05913497 0.05269058
## Cumulative Proportion  0.429554 0.6289404 0.7941381 0.85327309 0.90596367
##                            Comp.6    Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     0.40112599 0.3263851 0.30491529 0.27764252 0.24033322
## Proportion of Variance 0.03055164 0.0202271 0.01765352 0.01463676 0.01096732
## Cumulative Proportion  0.93651531 0.9567424 0.97439593 0.98903268 1.00000000
## 
## Loadings:
##                 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## z_pub_year       0.306  0.643  0.602  0.317  0.162                            
## open_data               0.170        -0.506         0.186        -0.802       
## open_mat         0.149  0.120        -0.556         0.517 -0.234  0.557       
## stanford                                     0.110  0.126               -0.971
## change_platform -0.123        -0.113  0.531 -0.121  0.786        -0.177  0.108
## z_log_ratio_ss  -0.409 -0.341  0.709        -0.450                            
## is_within       -0.199  0.222 -0.107 -0.166         0.167  0.832              
## single_vignette  0.185 -0.278  0.135         0.279  0.114 -0.236        -0.102
## z_log_sample     0.598        -0.155  0.138 -0.745 -0.104                     
## z_log_trials    -0.510  0.541 -0.227        -0.297 -0.104 -0.432        -0.111
##                 Comp.10
## z_pub_year             
## open_data              
## open_mat               
## stanford        -0.164 
## change_platform        
## z_log_ratio_ss         
## is_within        0.377 
## single_vignette  0.839 
## z_log_sample     0.147 
## z_log_trials     0.306 
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings       1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
## Proportion Var    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1
## Cumulative Var    0.1    0.2    0.3    0.4    0.5    0.6    0.7    0.8    0.9
##                Comp.10
## SS loadings        1.0
## Proportion Var     0.1
## Cumulative Var     1.0

Bayesian tinkering

Let’s try super strong priors, consolidating a few very highly correlated variables.

And let’s see if it’s the link

Models of clusters of predictors

## 
## Call:
## lm(formula = sub_rep ~ is_within + single_vignette + z_log_trials, 
##     data = data_tier1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7587 -1.3730 -0.3709  1.3848  2.8753 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.7485     0.3126   8.792 1.48e-15 ***
## is_within         1.2259     0.3624   3.383 0.000888 ***
## single_vignette  -0.7744     0.4353  -1.779 0.076969 .  
## z_log_trials     -0.4134     0.2221  -1.861 0.064390 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.652 on 172 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1165 
## F-statistic: 8.695 on 3 and 172 DF,  p-value: 2.106e-05

other bayesian models

TODO predictive

## # A tibble: 1 × 12
##   mean_pub_year mean_l…¹ mean_…² mean_…³ mean_…⁴ mean_…⁵ sd_pu…⁶ sd_lo…⁷ sd_lo…⁸
##           <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1     -6.17e-14    -16.2    4.50  -0.418   0.824    2.16    6.47    33.1    1.11
## # … with 3 more variables: sd_log_ratio_ss <dbl>, sd_target_d_calc <dbl>,
## #   sd_log_trials <dbl>, and abbreviated variable names ¹​mean_log_p,
## #   ²​mean_log_sample, ³​mean_log_ratio_ss, ⁴​mean_target_d_calc,
## #   ⁵​mean_log_trials, ⁶​sd_pub_year, ⁷​sd_log_p, ⁸​sd_log_sample