Pull data

Download

Read in cache.

Counts.

## # A tibble: 1 × 1
##       n
##   <int>
## 1   210

## # A tibble: 12 × 2
##    status                                       n
##    <chr>                                    <int>
##  1 ASK                                         12
##  2 cannot tell direction in rep                 2
##  3 cannot tell direction in replication         1
##  4 cannot tell direction in target              1
##  5 cannot tell direction on rep                 1
##  6 done                                       156
##  7 missing                                      2
##  8 non-experiment                              12
##  9 rep is the problem, might be salvageable     1
## 10 reproduction                                19
## 11 unusable                                     1
## 12 use raw!!!                                   2

## # A tibble: 1 × 1
##       n
##   <int>
## 1   177

## # A tibble: 4 × 2
##   include      n
##   <chr>    <int>
## 1 exp         48
## 2 no          34
## 3 pred_int    24
## 4 stats      104

We have maybe as many as 177 at least for some analyses.

Descriptions:

“done” – fully coded, can be used for whatever is specified in other column
“ES_direction” – we don’t have the direction coded
“ask” – might be salvageable, but I have questions about coding
“unusable” – got as far as I can, and it does not have complete descriptives
“missing” – student didn’t finish or we don’t have write-up
“non-experiment” – student replicated something that was not an experiment
“reproduction” – student did a reproduction and not replication

Parsing

We parse out values from the raw stats.

what didn’t parse

Check that nothing that has a stat input and doesn’t get an ES out.

## # A tibble: 0 × 2
## # Rowwise: 
## # … with 2 variables: target_lastauthor_year <chr>, target_raw_stat <chr>

## # A tibble: 1 × 2
## # Rowwise: 
##   target_lastauthor_year replication_raw_stat                            
##   <chr>                  <chr>                                           
## 1 obrien2015             MSE4:m1=4.8(.6),m2=5.5(.6),m3=3.8(.5),m4=3.7(.5)

PredInt and P_orig

Compute prediction intervals and p_orig.

viz SMD

First pass plots.

Working more on the visualization with subjective rep status.

How much missing data?

Note, it may seem weird that we’re missing d and SE for many more replications than we are p values. This is because we can’t get d_calc if we don’t have a filled in value for same direction (but we have p value and unsigned d_calc).

## total rows

## [1] 177

## number of rows for subjective w/ demographic/experimental

## expected

## [1] 176

## actual

## [1] 176

## number of rows for predInt/p_orig w/ demographic/experimental
## expected

## [1] 128

## actual

## [1] 127

## number of complete rows for full analysis
## expected

## [1] 104

## actual

## [1] 103

code vars for models

##    include          target_lastauthor_year academic_year     
##  Length:177         Length:177             Length:177        
##  Class :character   Class :character       Class :character  
##  Mode  :character   Mode  :character       Mode  :character  
##                                                              
##                                                              
##                                                              
##                                                              
##    subfield            pub_year           log_p             log_sample    
##  Length:177         Min.   :-48.525   Min.   :-295.9909   Min.   :0.6931  
##  Class :character   1st Qu.: -2.525   1st Qu.: -15.9231   1st Qu.:3.6889  
##  Mode  :character   Median :  1.475   Median :  -6.0717   Median :4.6151  
##                     Mean   :  0.000   Mean   : -15.1343   Mean   :4.5020  
##                     3rd Qu.:  3.475   3rd Qu.:  -3.7106   3rd Qu.:5.1957  
##                     Max.   :  8.475   Max.   :   0.6932   Max.   :7.4955  
##                                       NA's   :57          NA's   :1       
##   log_ratio_ss      change_platform  target_d_calc         stanford      
##  Min.   :-3.50656   Min.   :0.0000   Min.   :  0.0562   Min.   :0.00000  
##  1st Qu.:-0.88504   1st Qu.:0.0000   1st Qu.:  0.4679   1st Qu.:0.00000  
##  Median :-0.15149   Median :1.0000   Median :  0.6905   Median :0.00000  
##  Mean   :-0.41790   Mean   :0.5311   Mean   :  6.9238   Mean   :0.09605  
##  3rd Qu.: 0.05205   3rd Qu.:1.0000   3rd Qu.:  1.5818   3rd Qu.:0.00000  
##  Max.   : 3.67313   Max.   :1.0000   Max.   :452.2157   Max.   :1.00000  
##  NA's   :1                           NA's   :56                          
##    open_data         open_mat        is_within     single_vignette 
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.000   Median :0.0000  
##  Mean   :0.2938   Mean   :0.4689   Mean   :0.452   Mean   :0.4407  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000   Max.   :1.0000  
##                                                                    
##    log_trials       predInt           p_orig           sub_rep      
##  Min.   :0.000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.00001   1st Qu.:0.0000  
##  Median :1.609   Median :0.0000   Median :0.03442   Median :0.5000  
##  Mean   :2.163   Mean   :0.4651   Mean   :0.24073   Mean   :0.4901  
##  3rd Qu.:4.094   3rd Qu.:1.0000   3rd Qu.:0.46557   3rd Qu.:1.0000  
##  Max.   :8.230   Max.   :1.0000   Max.   :0.99971   Max.   :1.0000  
##                  NA's   :48       NA's   :48        NA's   :1

z-score

check z-score

tier 1 data

tier 2 data

tier 3 data

Pre-reg’d Models

Random exploratory analyses

correlation

## # A tibble: 91 × 3
##    value2          value1           corr
##    <chr>           <chr>           <dbl>
##  1 is_within       z_log_trials     0.7 
##  2 open_data       open_mat         0.56
##  3 is.soc          single_vignette  0.51
##  4 is.cog          z_log_trials     0.39
##  5 open_data       z_pub_year       0.38
##  6 single_vignette z_log_sample     0.38
##  7 open_mat        z_pub_year       0.35
##  8 open_mat        z_log_sample     0.35
##  9 is_within       is.cog           0.31
## 10 z_log_sample    z_pub_year       0.3 
## # … with 81 more rows

Exploring stuff for Tier 1

frequentism

regularized

Correlations

preds	r	p
z_pub_year	0.064	0.399
open_data	0.150	0.047
open_mat	0.002	0.979
stanford	-0.027	0.725
change_platform	-0.158	0.037
z_log_ratio_ss	-0.047	0.536
is_within	0.333	0.000
single_vignette	-0.267	0.000
z_log_sample	-0.108	0.155
z_log_trials	0.182	0.015

Principal components

Principal components are not very interpretable.

## Importance of components:
##                          Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     1.504085 1.0247345 0.9327507 0.55806625 0.52678095
## Proportion of Variance 0.429554 0.1993864 0.1651977 0.05913497 0.05269058
## Cumulative Proportion  0.429554 0.6289404 0.7941381 0.85327309 0.90596367
##                            Comp.6    Comp.7     Comp.8     Comp.9    Comp.10
## Standard deviation     0.40112599 0.3263851 0.30491529 0.27764252 0.24033322
## Proportion of Variance 0.03055164 0.0202271 0.01765352 0.01463676 0.01096732
## Cumulative Proportion  0.93651531 0.9567424 0.97439593 0.98903268 1.00000000

## 
## Loadings:
##                 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## z_pub_year       0.306  0.643  0.602  0.317  0.162                            
## open_data               0.170        -0.506         0.186        -0.802       
## open_mat         0.149  0.120        -0.556         0.517 -0.234  0.557       
## stanford                                     0.110  0.126               -0.971
## change_platform -0.123        -0.113  0.531 -0.121  0.786        -0.177  0.108
## z_log_ratio_ss  -0.409 -0.341  0.709        -0.450                            
## is_within       -0.199  0.222 -0.107 -0.166         0.167  0.832              
## single_vignette  0.185 -0.278  0.135         0.279  0.114 -0.236        -0.102
## z_log_sample     0.598        -0.155  0.138 -0.745 -0.104                     
## z_log_trials    -0.510  0.541 -0.227        -0.297 -0.104 -0.432        -0.111
##                 Comp.10
## z_pub_year             
## open_data              
## open_mat               
## stanford        -0.164 
## change_platform        
## z_log_ratio_ss         
## is_within        0.377 
## single_vignette  0.839 
## z_log_sample     0.147 
## z_log_trials     0.306 
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings       1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
## Proportion Var    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1
## Cumulative Var    0.1    0.2    0.3    0.4    0.5    0.6    0.7    0.8    0.9
##                Comp.10
## SS loadings        1.0
## Proportion Var     0.1
## Cumulative Var     1.0

Model exploration

## 
## Call:
## lm(formula = sub_rep ~ z_pub_year + subfield + open_data + open_mat + 
##     stanford + change_platform + z_log_ratio_ss + is_within + 
##     single_vignette + z_log_sample + z_log_trials, data = data_tier1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7511 -1.3244 -0.2346  1.3409  3.4323 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.218252   0.407093   7.905 3.89e-13 ***
## z_pub_year           0.094443   0.143961   0.656  0.51274    
## subfieldnon-psych   -0.094368   0.455422  -0.207  0.83611    
## subfieldother-psych  0.005194   0.403553   0.013  0.98975    
## subfieldsocial      -0.600773   0.345548  -1.739  0.08400 .  
## open_data            0.472727   0.357222   1.323  0.18759    
## open_mat            -0.508718   0.327028  -1.556  0.12176    
## stanford            -0.008220   0.442056  -0.019  0.98519    
## change_platform     -0.589066   0.288515  -2.042  0.04280 *  
## z_log_ratio_ss      -0.117647   0.157094  -0.749  0.45501    
## is_within            1.218763   0.381555   3.194  0.00169 ** 
## single_vignette     -0.328094   0.483279  -0.679  0.49818    
## z_log_sample        -0.091768   0.208241  -0.441  0.66003    
## z_log_trials        -0.350492   0.260318  -1.346  0.18005    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.633 on 162 degrees of freedom
## Multiple R-squared:  0.2007, Adjusted R-squared:  0.1365 
## F-statistic: 3.128 on 13 and 162 DF,  p-value: 0.0003479

Repeating regularized

## 11 x 1 sparse Matrix of class "dgCMatrix"
##                         s1
## (Intercept)      2.8428523
## z_pub_year       .        
## open_data        0.1085095
## open_mat         .        
## stanford         .        
## change_platform -0.4669277
## z_log_ratio_ss   .        
## is_within        0.9137496
## single_vignette -0.1819649
## z_log_sample     .        
## z_log_trials     .

## 11 x 1 sparse Matrix of class "dgCMatrix"
##                        s1
## (Intercept)     2.7619133
## z_pub_year      .        
## open_data       .        
## open_mat        .        
## stanford        .        
## change_platform .        
## z_log_ratio_ss  .        
## is_within       0.4362908
## single_vignette .        
## z_log_sample    .        
## z_log_trials    .

Bayesian tinkering

Let’s try super strong priors, consolidating a few very highly correlated variables.

And let’s see if it’s the link

Models of clusters of predictors

## 
## Call:
## lm(formula = sub_rep ~ is_within + single_vignette + z_log_trials, 
##     data = data_tier1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7587 -1.3730 -0.3709  1.3848  2.8753 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.7485     0.3126   8.792 1.48e-15 ***
## is_within         1.2259     0.3624   3.383 0.000888 ***
## single_vignette  -0.7744     0.4353  -1.779 0.076969 .  
## z_log_trials     -0.4134     0.2221  -1.861 0.064390 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.652 on 172 degrees of freedom
## Multiple R-squared:  0.1317, Adjusted R-squared:  0.1165 
## F-statistic: 8.695 on 3 and 172 DF,  p-value: 2.106e-05

Individual predictor - outcome correlations

Sub_rep

Pred_int

Note: predInt may not be reliably calculated in some cases. Dealing with numbers is hard!

P_orig