Show how to build a simple KRI reporting workflow in R
Author
Dror Berel
Published
May 21, 2026
Overview
This tutorial shows how to build a simple KRI reporting workflow in R using the gsm ecosystem. https://gilead-biostats.github.io/gsm/
As with all gsm workflows, there are many ways to accomplish the same end goal. This tutorial focuses on a straightforward approach that manually executes each step of the process, which can be helpful for users who are new to the framework and want to understand how the pieces fit together.
Disclaimer
Please refer to the package documentation and vignettes for more details on the individual functions and their parameters, as well as alternative approaches to workflow construction. The author is NOT a contributor to the gsm ecosystem and does NOT represent the official voice of the project or its maintainers. The tutorial is meant to be a practical example of how to use the gsm packages together, but it is not an official template or endorsed workflow.
Goal
The goal is to start from toy clinical analysis datasets, calculate two site-level rate metrics, convert those metric outputs into the reporting data model expected by gsm.kri, and then generate the inputs required for an HTML KRI report.
The tutorial uses:
pharmaverseadam::adsl as the subject-level source
pharmaverseadam::adae as the event-level source
gsm.core for metric calculation
gsm.mapping for long-format metadata construction
gsm.reporting for report-model objects
gsm.kri for charts and report generation
The two metrics in this document are not meant to be clinically meaningful production KRIs. They are compact examples that make the reporting pipeline easier to understand.
Note
the gsm framework also include a workflow engine that can execute a structured workflow definition like the one shown in examples/kri_report_from_lworkflows.R. The current tutorial focuses on the manual steps needed to calculate KRIs and prepare reporting data, but the same steps can be automated using the workflow engine once the user is comfortable with the individual pieces.
Setup
This section loads the required packages and example datasets.
What this section does:
Loads the analysis, reporting, and visualization packages.
Loads two toy ADaM datasets used in the rest of the tutorial.
Inputs:
Input
Content
Purpose
Installed R packages
gsm, gsm.core, gsm.kri, gsm.mapping, gsm.reporting, tidyverse, and pharmaverseadam.
Provide the analysis, reporting, visualization, and toy-data functionality used throughout the tutorial.
Built-in example data
Example datasets shipped in pharmaverseadam.
Supply the toy clinical inputs used in the KRI calculations.
Outputs:
Object
Content
Purpose
adsl
Tibble-like data.frame containing subject-level records.
Used as the denominator population and as the source of site/study metadata.
adae
Tibble-like data.frame containing adverse-event-level records.
`OverallMetric`, `Factor`, and `Score` columns created from normal
approximation.
ℹ 14 Group(s) have insufficient sample size due to KRI denominator less than 30: 705, 715, 711, 707, 716, 703, 704, 713, 706, 702, 709, 714, 718, 717
These group(s) will not have KRI score and flag summarized.
ℹ Sorted dfFlagged using custom Flag order: 2.Sorted dfFlagged using custom Flag order: -2.Sorted dfFlagged using custom Flag order: 1.Sorted dfFlagged using custom Flag order: -1.Sorted dfFlagged using custom Flag order: 0.
label_AE_1 <-list(Metric ="AE",Numerator ="Count of AEs",Denominator ="Count of Subjects")dfbooud_ad_1 <-Analyze_NormalApprox_PredictBounds(KRI_1, strType ="rate")
nStep was not provided. Setting default step to 0.2.
The scatterplot widget allows for visual inspection of the site-level metric values, flags, and model-based bounds. This can be helpful for validating the analysis results before including them in the final report. the Widget_ScatterPlot() functions call a js file that is included in the package.
See https://github.com/Gilead-BioStats/gsm.kri/tree/dev/inst/htmlwidgets
KRI 2
This section calculates a second site-level rate KRI using a reduced event dataset. Typically it will be a different metric than the first KRI, but for simplicity this example reuses the same rate-based approach with a smaller numerator source.
Common metrics include the following:
- Serious Adverse Event Reporting Rate
- Non-important Protocol Deviation Rate
- Important Protocol Deviation Rate
- Grade 3+ Lab Abnormality Rate
- Study Discontinuation Rate
- Treatment Discontinuation Rate
- Query Rate
- Outstanding Query Rate
- Outstanding Data Entry Rate
- Data Change Rate
- Screen Failure Rate
What this section does:
Repeats the same analytic steps used for KRI_1.
Uses adae %>% head(1000) as the numerator source.
Produces a second summarized KRI object that can be included in the same report.
Inputs:
Input
Content
Purpose
adsl
data.frame/tibble containing subject-level records.
Provides denominator data and site grouping structure.
adae %>% head(1000)
Reduced data.frame/tibble containing event-level records.
Provides a smaller numerator event source for the second KRI.
v_threshold
Numeric vector of threshold breakpoints.
Controls thresholding in the flagging step.
v_flag
Numeric vector of flag values.
Controls category assignment during flagging.
v_risk_score_weight
Numeric vector of risk-score weights.
Controls weighted risk-score assignment in the flagged output.
Outputs:
Object
Content
Purpose
KRI_2
data.frame/tibble containing summarized site-level metric output from gsm.core::Summarize().
Second analysis result used in the final report.
label_SAE
Named list containing chart labels for the second metric.
Used by the widget for display text.
dfbooud_ad_2
data.frame containing prediction bounds for plotting.
`OverallMetric`, `Factor`, and `Score` columns created from normal
approximation.
ℹ 14 Group(s) have insufficient sample size due to KRI denominator less than 30: 716, 704, 705, 703, 711, 715, 713, 717, 714, 707, 706, 702, 709, 718
These group(s) will not have KRI score and flag summarized.
ℹ Sorted dfFlagged using custom Flag order: 2.Sorted dfFlagged using custom Flag order: -2.Sorted dfFlagged using custom Flag order: 1.Sorted dfFlagged using custom Flag order: -1.Sorted dfFlagged using custom Flag order: 0.
label_SAE <-list(Metric ="SAE",Numerator ="Count of SAEs",Denominator ="Count of Subjects")dfbooud_ad_2 <-Analyze_NormalApprox_PredictBounds(KRI_2, strType ="rate")
nStep was not provided. Setting default step to 0.2.
MetricID not found in dfBounds. Please double check input data if intentional.
Parsed -3,-2,2,3 to numeric vector: -3, -2, 2, 3
ℹ Only one snapshot found. Time series charts will not be generated.
Parsed -3,-2,2,3 to numeric vector: -3, -2, 2, 3
ℹ Only one snapshot found. Time series charts will not be generated.
The final report generation step is included but not executed, since the upstream objects are not meant to be production-ready. The tutorial focuses on the data manipulation and metric calculation steps that lead up to the report call, which can be enabled once the user has validated the intermediate objects in their local environment.
It uses a built-in Rmd template that comes with gsm.kri, but users can also build their own custom Rmd templates that consume the same data model. The report call will render the report and write the HTML output to disk, which can then be shared with stakeholders.