and some basic data cleaning
tta_data <- readxl::read_excel("Feas Submit to Activation Corr.xlsx") %>%
clean_names() %>%
mutate(year = lubridate::year(feasibility_approval_date)) %>%
filter(new_or_migrated == "New") %>%
filter(feasibility_approval_to_irb_submission >0) %>%
mutate(feasibility_approval_to_open = as.numeric(open_to_enrollment_date - feasibility_approval_date))
We know that there is a lot of variance in the time between feasibility approval and the IRB submission. We hypothesize that delayed IRB submission is a factor in longer time to activation. Let’s evaluate this.
First we filtered out migrated studies, then filtered out studies with IRB submission before feasibility approval, leaving us with 500 studies between 2017-2020.
tta_data %>%
select(feasibility_approval_to_irb_submission, feasibility_approval_to_open) %>%
correlate(quiet = TRUE)
## # A tibble: 2 x 3
## term feasibility_approval_… feasibility_approv…
## <chr> <dbl> <dbl>
## 1 feasibility_ap… NA 0.542
## 2 feasibility_ap… 0.542 NA
The correlation is 0.542, suggesting a relationship is present.
Let’s plot these to take a look.
Comparing feasibility_approval_to_irb_submission and feasibility_approval_to_open.
## `geom_smooth()` using formula 'y ~ x'
So we can see the correlation, with increasing time to open associated with longer delay to IRB submission.
Let’s look at slope and intercept
tta_data %>%
lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission, data = . ) %>%
broom::tidy()
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 171. 6.62 25.8 7.34e-94
## 2 feasibility_submiss… 1.07 0.0752 14.2 8.05e-39
We see an overall baseline of 171 days to open, and this is significantly affected by delayed IRB submission, which adds 1.07 days for every day of IRB submission delay.
tta_data %>%
lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission, data = . ) %>%
broom::glance()
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.289 0.288 106. 203. 8.05e-39 1
## # … with 6 more variables: logLik <dbl>, AIC <dbl>,
## # BIC <dbl>, deviance <dbl>, df.residual <int>,
## # nobs <int>
This IRB submission delay accounts for 28.8% of the variance in time to opening studies, suggesting that this is a major factor.
Across 2017-2020, the baseline time to approval is 171 days. For every day between feasibility approval and IRB submission, it takes another 1.07 days to open to accrual. This factor is highly significant in the model, and accounts for 29% of the variance in time to open.
Let’s look at whether this effect has changed over time.
by_year <- tta_data %>%
group_by(year) %>%
nest()
year_model <- function(df){
lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission, data = df)
}
by_year <- by_year %>%
mutate(model = map(data, year_model))
by_year %>%
mutate(glance = map(model, broom::glance)) %>%
unnest(glance)
## # A tibble: 4 x 15
## # Groups: year [4]
## year data model r.squared adj.r.squared sigma statistic
## <dbl> <lis> <lis> <dbl> <dbl> <dbl> <dbl>
## 1 2017 <tib… <lm> 0.275 0.269 135. 50.8
## 2 2018 <tib… <lm> 0.211 0.205 97.6 40.1
## 3 2019 <tib… <lm> 0.381 0.376 90.9 86.6
## 4 2020 <tib… <lm> 0.295 0.284 64.8 28.0
## # … with 8 more variables: p.value <dbl>, df <dbl>,
## # logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
## # df.residual <int>, nobs <int>
The contribution to time to opening varies across years, from 21.1% in 2018 to 38.1% in 2019, but is a consistently large component of the variance in time to open.
by_year %>%
mutate(tidy = map(model, broom::tidy)) %>%
unnest(tidy) %>%
arrange(term)
## # A tibble: 8 x 8
## # Groups: year [4]
## year data model term estimate std.error statistic
## <dbl> <lis> <lis> <chr> <dbl> <dbl> <dbl>
## 1 2017 <tib… <lm> (Int… 161. 17.5 9.16
## 2 2018 <tib… <lm> (Int… 176. 11.0 16.0
## 3 2019 <tib… <lm> (Int… 200. 10.3 19.4
## 4 2020 <tib… <lm> (Int… 125. 11.6 10.7
## 5 2017 <tib… <lm> feas… 1.40 0.197 7.13
## 6 2018 <tib… <lm> feas… 0.930 0.147 6.33
## 7 2019 <tib… <lm> feas… 0.885 0.0951 9.31
## 8 2020 <tib… <lm> feas… 0.978 0.185 5.30
## # … with 1 more variable: p.value <dbl>
The baseline time to activation steadily climbed from 161 days to 200 days from 2017-2019, then fell to 125 days in 2020.
The effect of delayed IRB submission has fallen from 1.40 days per day of delay in 2017, to as low as 0.885 days in 2019, and 0.978 days per day of delay in 2020. A value of <1 for this suggests that we are more effectively doing parallel tasks (budgeting, contracting) while waiting for an IRB submission.
Is year of feasibility submission a significant predictor after adjusting for IRB submission delay?
tta_data %>%
lm(feasibility_approval_to_open ~ feasibility_submission_to_irb_submission + year, data = .) %>%
broom::tidy()
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 25437. 9430. 2.70 7.22e- 3
## 2 feasibility_submiss… 1.06 0.0748 14.2 1.00e-38
## 3 year -12.5 4.67 -2.68 7.62e- 3
Yes, after adjusting for IRB submission days (which add 1.06 days per day of delay over 2017-2020), the time to open has gone down by 12.5 days per year (p = 0.00762) over the period 2017-2020.
tta_data %>%
lm(feasibility_submission_to_irb_submission ~ year,
data = .) %>%
broom::tidy()
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 4643. 5643. 0.823 0.411
## 2 year -2.27 2.80 -0.812 0.417
IRB delayed submission has decreased over the period 2017-2020, by 2.27 days per year.