library(papaja)
library(rmarkdown)
library(tidyverse) 
library(here)
library(glue)
library(metafor)
library(knitr)
library(gridExtra)
library(here)
library(heatmaply)
library(MuMIn)
library(glmulti)
library(PRISMAstatement)
library(PublicationBias)

DATA_PATH <- here("data/processed/syntactic_bootstrapping_tidy_data.csv") 
RAW_DATA_PATH <- here("data/raw/syntactic_bootstrapping_raw_data.csv")

ma_data <- read_csv(DATA_PATH) %>% mutate(row_id = 1:n())

ETA

For a chosen ratio of publication probabilities, eta, estimates a publication bias-corrected pooled point estimate and confidence interval per Mathur & VanderWeele (2020). Model options include fixed-effects (a.k.a. “common-effect”), robust independent, and robust clustered specifications

The ratio eta represents the number of times more likely affirmative studies (i.e., those with a “statistically significant” and positive estimate) are to be published than nonaffirmative studies (i.e., those with a “nonsignificant” or negative estimate).

eta <- data.frame()

for (i in 1:20){
  df <- corrected_meta(
      ma_data$d_calc,
      ma_data$d_var_calc,
      eta= i,
      clustervar = 1:length(ma_data$d_calc),
      model = "robust",
      selection.tails = 1,
      favor.positive = TRUE,
      alpha.select = 0.05,
      CI.level = 0.95,
      small = TRUE)
  eta <- bind_rows(eta, df)
  
}

It seems that only when affirmative studies were 15 times more likely to be published than the non-affirmative studies will we end up with a est = 0

eta %>% ggplot(aes(x = eta, y = pval)) + 
  geom_point() + 
  geom_line() + 
  geom_hline(yintercept = 0.05, color = "red", type = 2) + 
  labs(title = "eta vs pval")

eta %>% ggplot(aes(x = eta, y = est)) + 
  geom_point() + 
  geom_line() + 
  labs(title = "eta vs est")

Really don’t know what this plot is telling me but it’s from the R document. Only understand it’s trying different eta in a fancier way.

eta.list = as.list( c( 200, 150, 100, 50, 40, 30, 20, rev( seq(1,15,1) ) ) )
res.list = lapply(eta.list, function(x){
cat("\n Working on eta = ", x)
return(corrected_meta(
      ma_data$d_calc,
      ma_data$d_var_calc,
      eta= x,
      clustervar = 1:length(ma_data$d_calc),
      model = "robust",
      selection.tails = 1,
      favor.positive = TRUE,
      alpha.select = 0.05,
      CI.level = 0.95,
      small = TRUE) )
})
## 
##  Working on eta =  200
##  Working on eta =  150
##  Working on eta =  100
##  Working on eta =  50
##  Working on eta =  40
##  Working on eta =  30
##  Working on eta =  20
##  Working on eta =  15
##  Working on eta =  14
##  Working on eta =  13
##  Working on eta =  12
##  Working on eta =  11
##  Working on eta =  10
##  Working on eta =  9
##  Working on eta =  8
##  Working on eta =  7
##  Working on eta =  6
##  Working on eta =  5
##  Working on eta =  4
##  Working on eta =  3
##  Working on eta =  2
##  Working on eta =  1
res.df = as.data.frame(do.call("rbind",res.list)) 

ggplot( data = res.df, aes( x = eta, y = est ) ) +
geom_ribbon( data = res.df, aes( x = eta, ymin = lo, ymax = hi ), fill = "gray" ) +
geom_line( lwd = 1.2 ) +
xlab( bquote( eta ) ) +
ylab( bquote( hat(mu)[eta] ) ) +
theme_classic()

pval_plot

Plots the one-tailed p-values. The leftmost red line indicates the cutoff for one-tailed p-values less than 0.025 (corresponding to “affirmative” studies; i.e., those with a positive point estimate and a two-tailed p-value less than 0.05). The rightmost red line indicates one-tailed p-values greater than 0.975 (i.e., studies with a negative point estimate and a two-tailed p-value less than 0.05). If there is a substantial point mass of p-values to the right of the rightmost red line, this suggests that selection may be two-tailed rather than one-tailed. **

this doesn’t look good, “If publication bias favors “significant” results regardless of sign, one would expect to see an increased density of one-tailed p-values not only below 0.025 but also above 0.975 (because the latter corresponds to two-tailed p-values less than 0.05 with negative point estimates)"

pval_plot(ma_data$d_calc,
ma_data$d_var_calc, alpha.select = 0.05) 

significance_funnel

Creates a modified funnel plot that distinguishes between affirmative and nonaffirmative studies, helping to detect the extent to which the nonaffirmative studies’ point estimates are systematically smaller than the entire set of point estimates. The estimate among only nonaffirmative studies (gray diamond) represents a corrected estimate under worst-case publication bias. If the gray diamond represents a negligible effect size or if it is much smaller than the pooled estimate among all studies (black diamond), this suggests that the meta-analysis may not be robust to extreme publication bias. Numerical sensitivity analyses (via PublicationBias::svalue) should still be carried out for more precise quantitative conclusion

doesn’t look too bad?

significance_funnel(
ma_data$d_calc,
ma_data$d_var_calc,
xmin = min(ma_data$d_calc),
xmax = max(ma_data$d_calc),
ymin = 0,
ymax = max(sqrt(ma_data$d_var_calc)),
xlab = "Point estimate",
ylab = "Estimated standard error",
favor.positive = TRUE,
est.all = NA,
est.N = NA,
alpha.select = 0.05,
plot.pooled = TRUE
)

sval

Estimates the S-value, defined as the severity of publication bias (i.e., the ratio by which affirmative studies are more likely to be published than nonaffirmative studies) that would be required to shift the pooled point estimate or its confidence interval limit to the value q.

The function returns: - the amount of publication bias required to attenuate the pooled point estimate to q (sval.est), - the amount of publication bias required to attenuate the confidence interval limit of the pooled point estimate to q (sval.ci), - the number of affirmative and nonaffirmative studies after any needed recoding of signs (k.affirmative and k.nonaffirmative), and an indicator for whether the point estimates’ signs were recoded (signs.recoded).

Doesn’t seem to be too bad? Impossible to attenuate the pooled point estimate to 0, and for 15+ publication bias severity to render the interval limit?
cf. example on pg 24

sval <- data.frame()

for (i in seq(0, 0.1, by = 0.02)){
  s <- svalue(
          ma_data$d_calc,
          ma_data$d_var_calc,
          q = i,
          clustervar = 1:length(ma_data$d_calc),
          model = "robust",
          alpha.select = 0.05,
          eta.grid.hi = 200,
          favor.positive = TRUE,
          CI.level = 0.95,
          small = TRUE,
          return.worst.meta = FALSE
          )
  s <- s %>% mutate(q_val = i)

  sval <- bind_rows(sval, s) 
}

sval
##       sval.est   sval.ci k.affirmative k.nonaffirmative signs.recoded q_val
## 1 Not possible 15.331140            17               36         FALSE  0.00
## 2 Not possible  7.914667            17               36         FALSE  0.02
## 3 Not possible  5.172439            17               36         FALSE  0.04
## 4 Not possible  3.747877            17               36         FALSE  0.06
## 5 Not possible  2.878591            17               36         FALSE  0.08
## 6 Not possible  2.295787            17               36         FALSE  0.10