Quitting from lines 16-98 (appendix_I.Rmd) 

1 Eliminate duplicate responses

First, we will check if the duplicated responses were consistent. If not, we will retain the response in which EDC use was marked as “Yes.” The reason is that the respondent who answered No may not be aware of EDC use in the clinical trial. If the responses are concordant, then the 2nd response will be removed. We will use the following code segment for this.

In the first segment, we make a data frame of duplicated trials with serial numbers of responses.

In the second segment, we make a data frame for the first responses and then merge it with the data frame comprising of a list of duplicated trials. For each row, we make a variable that selects the serial of response with the EDC use response = Yes if there is a discordance. The final two code chunks make a data frame with trials for which a single response has been obtained.

2 Statistical Analysis Plan

2.1 Participation Statistics

The following table shows the statistics related to participation in the survey. The number of investigators who were contacted is different from the number of trials as for a trial more than one investigator may have been contacted.

2.2 Included trial characteristics

First we take a look at the trial characteristics of all trials were were in the sampling frame.

Next we compare the trial characteristics of those studies which have responded versus those which have not.

2.3 EDC Adoption Rate

EDC Adoption Rate (EAR): The primary outcome measure is EAR. This will be defined as the ratio of the number of CTRI registered trials that use an EDC with sophistication level 2 or more to that of the participating trials (unique CTRI registered trials for which investigators agreed to participate in the study. The proportion and the binomial 95% confidence intervals of the same will be reported.

The EDC sophistication level is defined as follows:

  • Level 1: There is a unique account and password for each user to access the online system.

  • Level 2: Sites enter subject visit data through a Web interface into electronic case report forms (eCRFs). The completion status of each eCRF for each subject can be tracked automatically online. The system provides an audit trail for all data entry and data modification

  • Level 3: Data validation happens automatically when data are entered into the eCRF. The system will automatically log the user off after a period of inactivity.

  • Level 4: Subjects are randomized automatically

  • Level 5: Subject recruitment can be tracked online for each site

  • Level 6: The system allows tracking of medication inventory at the sites.

For a level to be considered complete, all the questions should be marked as Yes. If one of the questions is marked as No and a higher level is marked Yes then the higher level will be taken. For each unique trial we will therefore calculate the highest EDC sophistication level. If EDC is not used then sophistication level will be marked as missing.

The following table shows the EDC adoption rate and the different levels in the trials for which responses were received in the survey.

The following table shows the breakdown of key trial characteristics by EDC adoption status. Comparison between groups has been done using Chi-square test for categorical variables and Wilcox rank sum test for continuous variables.

2.4 Influence of trial parameters on EAR

Influence of trial parameters on EAR

To determine the influence of the trial parameters on EAR, we will use a logistic regression model where the dependent variable will be EDC adoption with EDC sophistication level 2 or more (modeled as Yes or No). Independent variables will be:
1. Trial sponsor: Industry or Investigator-Initiated. In studies where the primary sponsor is a pharmaceutical company or device manufacturer, the user will be considered industry-sponsored, and the rest will be considered investigator-initiated.
2. Trial sample size: Total trial sample size will be modeled as a continuous variable. To relax the linearity assumption, this will be expanded using a restricted cubic spline with 3 knots.
3. Trial sites: The number of sites will also be modeled as a continuous variable. Again to relax the linearity assumptions, the model term will be expanded using a restricted cubic spline with three knots.

Interactions will be testing in an omnibus model containing all interaction terms. Wald test will be used for determining the significance of any interaction. Odds ratios with 95% confidence intervals will be reported.

                Wald Statistics          Response: edc_adoption 

 Factor                                                       Chi-Square d.f. P     
 sample_size  (Factor+Higher Order Factors)                    6.30      6    0.3905
  All Interactions                                             2.62      4    0.6233
  Nonlinear (Factor+Higher Order Factors)                      2.75      3    0.4315
 sites  (Factor+Higher Order Factors)                         11.31      3    0.0101
  All Interactions                                             1.39      2    0.4994
 industry_funded  (Factor+Higher Order Factors)                1.83      3    0.6095
  All Interactions                                             0.91      2    0.6335
 sample_size * sites  (Factor+Higher Order Factors)            1.39      2    0.4994
  Nonlinear                                                    0.46      1    0.4958
  Nonlinear Interaction : f(A,B) vs. AB                        0.46      1    0.4958
 sample_size * industry_funded  (Factor+Higher Order Factors)  0.91      2    0.6335
  Nonlinear                                                    0.81      1    0.3674
  Nonlinear Interaction : f(A,B) vs. AB                        0.81      1    0.3674
 TOTAL NONLINEAR                                               2.75      3    0.4315
 TOTAL INTERACTION                                             2.62      4    0.6233
 TOTAL NONLINEAR + INTERACTION                                 3.65      5    0.6008
 TOTAL                                                        24.95      8    0.0016

As the results of the above ANOVA show, the Wald test for non-linear terms as well as interactions is not significant. Hence we show the simplified model without the interaction terms as well as without the non-linear assumption. The table below shows the results of the logistic regression analysis.

2.5 EDC Sophistication Level

We will provide data on the median EDC sophistication levels as well as a plot showing the proportion of CTRI registered trials with different levels of EDC sophistication. Further visualization and analysis will also explore the association between trial sample size, number of trial sites, and type of trial sponsorship with EDC sophistication.

In the following table we will show the univariable analysis of the factors which influenced EDC sophistication level. We will dichotomize the level into two categories (score 6 or score 1-5).

3 Additional Analyses

Additionally the survey collected data on alternative methods for data collection used in the trial as well a single item question on the key perceived barriers towards adoption of EDC in their trial.

Other reasons identified for not using EDC were:

Finally two additional questions were asked about the trial center weather they had access to a CTU and an IRB. We will evaluate the data in relation to EDC use.

4 Industry Sponsored trials

The percentage of industry sponsored trials by each year of registration is shown in the figure below.

5 Packages used

  1. R : R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  2. Tidyverse : Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

  3. gtsummary : Daniel D. Sjoberg, Michael Curry, Margie Hannum, Joseph Larmarange, Karissa Whiting and Emily C. Zabor (2021). gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. <https://github.com/ddsjoberg/gtsummary>, http://www.danieldsjoberg.com/gtsummary/.

  4. Hmisc : Frank E Harrell Jr, with contributions from Charles Dupont and many others. (2021). Hmisc: Harrell Miscellaneous. https://hbiostat.org/R/Hmisc/, https://github.com/harrelfe/Hmisc/

  5. flextable : flextable: Functions for Tabular Reporting. https://ardata-fr.github.io/flextable-book/, https://davidgohel.github.io/flextable/.

  6. rms : Frank E Harrell Jr (2021). rms: Regression Modeling Strategies. https://hbiostat.org/R/rms/, https://github.com/harrelfe/rms.

  7. ggplot2: H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

  8. Lubridate: Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.

