Read in data

and some basic data cleaning

tta_data <- readxl::read_excel("Feas Submit to Activation Corr.xlsx") %>% 
  clean_names() %>% 
  mutate(year = lubridate::year(feasibility_approval_date)) %>% 
  filter(new_or_migrated == "New") %>% 
  filter(feasibility_approval_to_irb_submission >0) %>% 
  mutate(feasibility_approval_to_open = as.numeric(open_to_enrollment_date - feasibility_approval_date))

Factors Affecting Time to Activation

We know that there is a lot of variance in the time between feasibility approval and the IRB submission. We hypothesize that delayed IRB submission is a factor in longer time to activation. Let’s evaluate this.

First we filtered out migrated studies, then filtered out studies with IRB submission before feasibility approval, leaving us with 500 studies between 2017-2020.

tta_data %>% 
  select(feasibility_approval_to_irb_submission, feasibility_approval_to_open) %>% 
  correlate(quiet = TRUE) 
## # A tibble: 2 x 3
##   term            feasibility_approval_… feasibility_approv…
##   <chr>                            <dbl>               <dbl>
## 1 feasibility_ap…                 NA                   0.542
## 2 feasibility_ap…                  0.542              NA

The correlation is 0.542, suggesting a relationship is present.

Let’s plot these to take a look.

Scatterplot

Comparing feasibility_approval_to_irb_submission and feasibility_approval_to_open.

## `geom_smooth()` using formula 'y ~ x'

So we can see the correlation, with increasing time to open associated with longer delay to IRB submission.

Modeling

Let’s look at slope and intercept

tta_data %>% 
  lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission, data = . ) %>% 
  broom::tidy()
## # A tibble: 2 x 5
##   term                 estimate std.error statistic  p.value
##   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)            171.      6.62        25.8 7.34e-94
## 2 feasibility_submiss…     1.07    0.0752      14.2 8.05e-39

We see an overall baseline of 171 days to open, and this is significantly affected by delayed IRB submission, which adds 1.07 days for every day of IRB submission delay.

tta_data %>% 
  lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission, data = . ) %>% 
  broom::glance()
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic  p.value    df
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>
## 1     0.289         0.288  106.      203. 8.05e-39     1
## # … with 6 more variables: logLik <dbl>, AIC <dbl>,
## #   BIC <dbl>, deviance <dbl>, df.residual <int>,
## #   nobs <int>

This IRB submission delay accounts for 28.8% of the variance in time to opening studies, suggesting that this is a major factor.

Across 2017-2020, the baseline time to approval is 171 days. For every day between feasibility approval and IRB submission, it takes another 1.07 days to open to accrual. This factor is highly significant in the model, and accounts for 29% of the variance in time to open.

Modeling by year

Let’s look at whether this effect has changed over time.

by_year <- tta_data %>% 
  group_by(year) %>% 
  nest()

year_model <- function(df){
  lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission, data = df)
}

by_year <- by_year %>% 
  mutate(model = map(data, year_model))

by_year %>% 
  mutate(glance = map(model, broom::glance)) %>% 
  unnest(glance)
## # A tibble: 4 x 15
## # Groups:   year [4]
##    year data  model r.squared adj.r.squared sigma statistic
##   <dbl> <lis> <lis>     <dbl>         <dbl> <dbl>     <dbl>
## 1  2017 <tib… <lm>      0.275         0.269 135.       50.8
## 2  2018 <tib… <lm>      0.211         0.205  97.6      40.1
## 3  2019 <tib… <lm>      0.381         0.376  90.9      86.6
## 4  2020 <tib… <lm>      0.295         0.284  64.8      28.0
## # … with 8 more variables: p.value <dbl>, df <dbl>,
## #   logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
## #   df.residual <int>, nobs <int>

The contribution to time to opening varies across years, from 21.1% in 2018 to 38.1% in 2019, but is a consistently large component of the variance in time to open.

Changes in IRB Submission Delay Effect over Time

by_year %>% 
  mutate(tidy = map(model, broom::tidy)) %>% 
  unnest(tidy) %>% 
  arrange(term)
## # A tibble: 8 x 8
## # Groups:   year [4]
##    year data  model term  estimate std.error statistic
##   <dbl> <lis> <lis> <chr>    <dbl>     <dbl>     <dbl>
## 1  2017 <tib… <lm>  (Int…  161.      17.5         9.16
## 2  2018 <tib… <lm>  (Int…  176.      11.0        16.0 
## 3  2019 <tib… <lm>  (Int…  200.      10.3        19.4 
## 4  2020 <tib… <lm>  (Int…  125.      11.6        10.7 
## 5  2017 <tib… <lm>  feas…    1.40     0.197       7.13
## 6  2018 <tib… <lm>  feas…    0.930    0.147       6.33
## 7  2019 <tib… <lm>  feas…    0.885    0.0951      9.31
## 8  2020 <tib… <lm>  feas…    0.978    0.185       5.30
## # … with 1 more variable: p.value <dbl>

The baseline time to activation steadily climbed from 161 days to 200 days from 2017-2019, then fell to 125 days in 2020.

The effect of delayed IRB submission has fallen from 1.40 days per day of delay in 2017, to as low as 0.885 days in 2019, and 0.978 days per day of delay in 2020. A value of <1 for this suggests that we are more effectively doing parallel tasks (budgeting, contracting) while waiting for an IRB submission.

Does Year Matter?

Is year of feasibility submission a significant predictor after adjusting for IRB submission delay?

tta_data %>% 
  lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission + year, data = .) %>% 
  broom::tidy()
## # A tibble: 3 x 5
##   term                 estimate std.error statistic  p.value
##   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          25437.   9430.          2.70 7.22e- 3
## 2 feasibility_submiss…     1.06    0.0748     14.2  1.00e-38
## 3 year                   -12.5     4.67       -2.68 7.62e- 3

Yes, after adjusting for IRB submission days (which add 1.06 days per day of delay over 2017-2020), the time to open has gone down by 12.5 days per year (p = 0.00762) over the period 2017-2020.

Has Delay to IRB Submission Changed over Time?

tta_data %>% 
  lm(feasibility_submission_to_irb_submission ~ year, 
     data = .) %>% 
  broom::tidy()
## # A tibble: 2 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)  4643.     5643.       0.823   0.411
## 2 year           -2.27      2.80    -0.812   0.417

IRB delayed submission has decreased over the period 2017-2020, by 2.27 days per year.