Understanding these patterns could help clinics implement targeted interventions such as additional reminders for high-risk patients, better education about the importance of urodynamic testing, or addressing specific concerns that commonly lead to cancellations, ultimately improving resource utilization and patient care in urogynecology practices.

1 Load Data

2 Summary of Data Preparation Code

The data prep pipeline performs these key steps:

Load Pre-processed Dataset
- Reads the pre-processed RDS file (230119Urodynamics_DATA_LABELS_2024-09-16_1049.rds)
- Data was previously cleaned from REDCap export with PFDI-20 scores already integrated
Handle Missing Cancellation Status
- Patients with missing cancellation status are assumed to have completed their procedure (set to “Completed”)
- This retains patients who completed their procedure but had no status recorded
Remove Non-contributory Variables
- Removes administrative columns: Record ID, MRN, Complete status fields
- These variables are identifiers and should not be used in predictive modeling
Fix Data Entry Errors
- Corrects Year 2034 → 2024 (typo in original data)
- Recodes Tobacco use to Yes/No (Current/Former → Yes, Non-tobacco → No)
Feature Selection (Later in Pipeline)
- Near-zero variance columns removed via caret::nearZeroVar
- LASSO regularization selects optimal predictors
- Complete case analysis performed after variable selection

Final output: Clean dataset with demographic, clinical, and PFDI-20 data for predicting urodynamics procedure cancellation.

3 Data Cleaning

Consolidate Multiple Urodynamics Reason Columns into One

#' Consolidate Multiple Urodynamics Reason Columns into One
#'
#' This function takes a data frame containing multiple 'reason for urodynamics'
#' columns and consolidates them into a single column with appropriate levels.
#' The function identifies columns matching the pattern "What was the reason for
#' the urodynamics? (choice=*)" and creates a new column with the consolidated values.
#' The original reason columns are removed from the output data frame.
#'
#' @param urodynamics_dataset A data frame containing the urodynamics data with multiple
#'                           reason columns.
#' @param output_file Optional. Path to save the processed data frame. If NULL (default),
#'                   no file is saved.
#' @param verbose Logical. If TRUE, provides detailed logging of the transformation
#'                process. Default is FALSE.
#'
#' @return A modified data frame with a new consolidated column for urodynamics reasons
#'         and all original reason columns removed.
#'
#' @importFrom dplyr mutate select
#' @importFrom tidyr unite
#' @importFrom assertthat assert_that
#' @importFrom logger log_info log_debug log_error
#'
#' @examples
#' # Example 1: Basic usage with minimal output
#' urodynamics_data <- read.csv("urodynamics_data.csv")
#' processed_urodynamics <- consolidate_urodynamics_reasons(
#'   urodynamics_dataset = urodynamics_data,
#'   output_file = NULL,
#'   verbose = FALSE
#' )
#' # The resulting data frame has a new column "What was the reason for the urodynamics?"
#' # with values like "pre-op for prolapse surgery" or "evaluation of OAB"
#' # All original reason columns are removed from the output
#'
#' # Example 2: Save output to file with verbose logging
#' processed_with_log <- consolidate_urodynamics_reasons(
#'   urodynamics_dataset = urodynamics_data,
#'   output_file = "processed_urodynamics.csv",
#'   verbose = TRUE
#' )
#' # Console will show detailed logs of each transformation step
#' # File "processed_urodynamics.csv" will be created with processed data
#'
#' # Example 3: Process data with custom column selection
#' # First create a subset of the data with only some columns
#' subset_data <- urodynamics_data[, c("Record ID", "Age:",
#'   "What was the reason for the urodynamics? (choice=pre-op for prolapse surgery)",
#'   "What was the reason for the urodynamics? (choice=pre-op for sling)",
#'   "What was the reason for the urodynamics? (choice=evaluation of OAB)")]
#' # Then process this subset
#' processed_subset <- consolidate_urodynamics_reasons(
#'   urodynamics_dataset = subset_data,
#'   output_file = "subset_processed.csv",
#'   verbose = TRUE
#' )
#' # Note: Even with a subset of reason columns, the function will work correctly
consolidate_urodynamics_reasons <- function(urodynamics_dataset,
                                           output_file = NULL,
                                           verbose = FALSE) {
  # Initialize logger
  if (verbose) {
    logger::log_threshold(logger::INFO)
  } else {
    logger::log_threshold(logger::WARN)
  }
  
  logger::log_info("Starting consolidation of urodynamics reasons")
  
  # Validate input
  tryCatch({
    assertthat::assert_that(is.data.frame(urodynamics_dataset))
    logger::log_info("Input validation passed: urodynamics_dataset is a data frame")
    
    if (!is.null(output_file)) {
      assertthat::assert_that(is.character(output_file), length(output_file) == 1)
      logger::log_info(paste("Output will be saved to:", output_file))
    } else {
      logger::log_info("No output file specified, results will be returned only")
    }
    
    # Input data summary
    logger::log_info(paste("Input data dimensions:", 
                          nrow(urodynamics_dataset), "rows,", 
                          ncol(urodynamics_dataset), "columns"))
    
    # Find all reason columns
    reason_pattern <- "What was the reason for the urodynamics\\? \\(choice="
    reason_columns <- grep(reason_pattern, names(urodynamics_dataset), value = TRUE)
    
    if (length(reason_columns) == 0) {
      stop("No columns found matching the pattern for urodynamics reasons")
    }
    
    logger::log_info(paste("Found", length(reason_columns), "reason columns:"))
    if (verbose) {
      for (col in reason_columns) {
        logger::log_debug(paste("  *", col))
      }
    }
    
    # Process the data using helper functions
    processed_urodynamics <- process_reason_columns(
      urodynamics_dataset = urodynamics_dataset,
      reason_columns = reason_columns,
      reason_pattern = reason_pattern,
      verbose = verbose
    )
    
    # Save output if requested
    if (!is.null(output_file)) {
      tryCatch({
        utils::write.csv(processed_urodynamics, file = output_file, row.names = FALSE)
        logger::log_info(paste("Output successfully saved to:", output_file))
      }, error = function(e) {
        logger::log_error(paste("Error saving output file:", e$message))
        stop(paste("Error saving output file:", e$message))
      })
    }
    
    logger::log_info("Consolidation of urodynamics reasons completed successfully")
    return(processed_urodynamics)
    
  }, error = function(e) {
    logger::log_error(paste("Error in consolidate_urodynamics_reasons:", e$message))
    stop(paste("Error in consolidate_urodynamics_reasons:", e$message))
  })
}

#' Process reason columns to create consolidated reason column
#'
#' @param urodynamics_dataset Data frame with urodynamics data
#' @param reason_columns Vector of column names for reasons
#' @param reason_pattern Pattern to extract the reason value
#' @param verbose Whether to log detailed information
#'
#' @return Data frame with consolidated reason column
#'
#' @noRd
process_reason_columns <- function(urodynamics_dataset, reason_columns, 
                                   reason_pattern, verbose) {
  logger::log_info("Starting to process reason columns")
  
  # Create a new column for each reason, with NA or the reason value
  processed_dataset <- urodynamics_dataset
  
  # Map of reasons (will be filled during processing)
  reason_values <- c()
  
  # For each reason column, extract the reason and set a new column with the value
  for (col in reason_columns) {
    # Extract the reason from the column name
    reason_value <- extract_reason_value(col, reason_pattern)
    reason_values <- c(reason_values, reason_value)
    
    if (verbose) {
      logger::log_debug(paste("Processing column:", col))
      logger::log_debug(paste("Extracted reason:", reason_value))
    }
    
    # Create a new column with the reason value if the original is "Checked"
    processed_dataset <- dplyr::mutate(processed_dataset, 
                                      !!paste0("Reason_", reason_value) := 
                                        ifelse(.data[[col]] == "Checked", 
                                               reason_value, NA))
  }
  
  # Create the consolidated reason column
  logger::log_info("Creating consolidated reason column")
  
  # Get the new individual reason columns
  new_reason_columns <- paste0("Reason_", reason_values)
  
  # Use coalesce to create the consolidated column
  processed_dataset <- create_consolidated_column(
    processed_dataset = processed_dataset,
    reason_columns = new_reason_columns,
    verbose = verbose
  )
  
  # Clean up temporary columns and original reason columns
  logger::log_info("Removing temporary columns and original reason columns")
  columns_to_remove <- c(new_reason_columns, reason_columns)
  processed_dataset <- processed_dataset[, !names(processed_dataset) %in% columns_to_remove]
  
  if (verbose) {
    logger::log_debug(paste("Removed", length(columns_to_remove), "columns"))
    logger::log_debug(paste("Remaining columns:", ncol(processed_dataset)))
  }
  
  return(processed_dataset)
}

#' Extract reason value from column name
#'
#' @param column_name Column name containing the reason
#' @param pattern Pattern to match for extracting the reason
#'
#' @return Extracted reason value
#'
#' @noRd
extract_reason_value <- function(column_name, pattern) {
  # Extract the part after "choice=" and before the closing parenthesis
  reason <- gsub(pattern, "", column_name)
  reason <- gsub("\\)$", "", reason)
  
  return(reason)
}

#' Create consolidated reason column using coalesce
#'
#' @param processed_dataset Dataset with individual reason columns
#' @param reason_columns Vector of column names to consolidate
#' @param verbose Whether to log detailed information
#'
#' @return Dataset with added consolidated column
#'
#' @noRd
create_consolidated_column <- function(processed_dataset, reason_columns, verbose) {
  if (verbose) {
    logger::log_debug("Creating consolidated column from:")
    for (col in reason_columns) {
      logger::log_debug(paste("  *", col))
    }
  }
  
  # Create a consolidated column by coalescing all individual reason columns
  processed_dataset$`What was the reason for the urodynamics?` <- NA
  
  # Iterate through each row
  for (i in 1:nrow(processed_dataset)) {
    for (col in reason_columns) {
      if (!is.na(processed_dataset[i, col])) {
        processed_dataset[i, "What was the reason for the urodynamics?"] <- 
          processed_dataset[i, col]
        break  # Take the first non-NA value
      }
    }
  }
  
  # If multiple reasons are checked, they'll be concatenated
  logger::log_info("Consolidated reason column created successfully")
  
  return(processed_dataset)
}

Data Preview (First 20 of 841 Observations)
Was the procedure cancelled?	Age:	BMI	Race:	Is the patient hispanic, latino or of Spanish origin?	Does the patient have diabetes?	Does the patient have a h/o recurrent UTIs?	Immunocompromised:	Tobacco use:	Menopause status:	Is the patient on vaginal estrogen?	Does the patient have OAB?	Average number of voids at night:	POP-Q stage:	Year	POPDI_6	CRADI_8	UDI_6	PFDI_20_total	What was the reason for the urodynamics?
Yes	49	34.18	Other	Yes	No	No	No	No	Pre-menopausal	No	No	1.0		2022	4.166667	6.250000	50.000000	100	pre-op for prolapse surgery
Yes	78	31.28	White	No	No	No	No	Yes	Post-menopausal	No	No	5.0		2022	79.166667	53.125000	91.666667	224	pre-op for prolapse surgery
Yes	63	27.17	White	No	Yes	No	No	Yes	Post-menopausal	No	Yes	1.0		2022	16.666667	50.000000	50.000000	117	pre-op for prolapse surgery
Yes	62	18.96	White	No	No	No	No	No	Post-menopausal	No	Yes	3.0	0	2022	0.000000	7.142857	20.833333	28	pre-op for sling
No	70	37.12	White	No	No	Yes	Yes	Yes	Post-menopausal	No	Yes	1.5	0	2022	8.333333	0.000000	91.666667	100	evaluation of OAB
No	59	29.87	White	No	No	No	No	No	Post-menopausal	No	Yes	4.0	3	2022	60.000000	56.250000	81.250000	198	pre-op for prolapse surgery
No	53	30.27	Other	No	No	Yes	No	No	Post-menopausal	No	No	4.5	1	2022	79.166667	78.125000	100.000000	257	pre-op for sling
No	37	25.79	Other	No	No	No	No	No	Pre-menopausal	No	No	1.0	1	2022	0.000000	6.250000	8.333333	15	pre-op for sling
No	86	34.01	White	No	No	No	No	No	Post-menopausal	Yes	No	3.5	3	2022	29.166667	18.750000	50.000000	100	pre-op for prolapse surgery
No	67	25.86	White	No	No	No	No	Yes	Post-menopausal	No	Yes	2.0	0	2022	0.000000	50.000000	62.500000	112	evaluation of OAB
No	61	30.90	Other	Yes	Yes	No	No	No	Post-menopausal	No	Yes	4.0	1	2022	29.166667	18.750000	50.000000	100	pre-op for sling
No	68	23.00	White	No	No	No	No	No	Post-menopausal	No	Yes	4.0	1	2022	29.166667	18.750000	50.000000	100	pre-op for sling
No	67	20.25	White	No	No	No	No	No	Post-menopausal	Yes	No	1.0	3	2022	29.166667	6.250000	50.000000	85	pre-op for prolapse surgery
No	74	27.34	Other	No	No	No	Yes	No	Post-menopausal	No	No	1.0	4	2022	33.333333	0.000000	37.500000	71	pre-op for prolapse surgery
No	51	36.07	Black or African American	No	Yes	No	No	No	Post-menopausal	No	Yes	5.5	0	2022	37.500000	6.250000	41.666667	85	evaluation of OAB
Yes	93	23.46	White	No	No	No	No	No	Post-menopausal	No	Yes	0.0	0	2022	29.166667	18.750000	50.000000	100	other
No	69	28.34	White	No	No	No	No	Yes	Post-menopausal	No	Yes	0.0	2	2022	29.166667	18.750000	50.000000	100	pre-op for prolapse surgery
No	47	35.73	White	No	No	No	No	No	Pre-menopausal	No	Yes	0.0	0	2022	16.666667	18.750000	58.333333	94	pre-op for sling
No	43	23.00	White	No	No	No	No	No	Pre-menopausal	No	Yes	1.0	0	2022	0.000000	0.000000	62.500000	62	evaluation of mixed incontinence
No	75	24.10	Other	No	No	No	No	No	Post-menopausal	Yes	No	1.5	4	2022	37.500000	6.250000	20.000000	64	pre-op for prolapse surgery

4 Are there Enough Events Per Predictor?

check_events_per_predictor(labels_df, "Was the procedure cancelled?")

📊 Logistic Regression Check: Events per Predictor Variable (EPV) 📊

==================================================================

Outcome Variable: Was the procedure cancelled?

Number of Observations: 841

Total Events (Yes): 57

Total Non-Events: 784

Number of Predictors: 19

Required Minimum Events (10 × 19): 190

4.0.1 Interpretation of Events Per Predictor Analysis

The Events Per Predictor Variable (EPV) rule is a guideline in logistic regression that recommends having at least 10 outcome events for each predictor variable in the model to ensure stable coefficient estimates and reliable inference.

In our dataset:

Total observations: 841 patients scheduled for urodynamic testing
Events (procedure cancellations): 57 patients had procedures cancelled due to urinary tract infection (UTI)
Non-events (completed procedures): 784 patients completed their scheduled procedures
Number of candidate predictors: 19 variables
Minimum events required: 190 (based on 10 × 19 predictors)
Current EPV ratio: 3 events per predictor variable

Assessment: With only 57 events and 19 candidate predictors, the EPV ratio (3) is below the recommended minimum of 10. This limitation will be addressed through LASSO regularization, which automatically selects a parsimonious subset of predictors, thereby improving the effective EPV ratio in the final model.

The LASSO variable selection process (described in the Feature Selection section below) will identify the most predictive variables, reducing the number of predictors in the final model and substantially improving the EPV ratio. The final model metrics are reported in the Results section.

5 Table 1

Looking at the demographics table comparing patients whose procedures were cancelled versus completed, there are several notable findings:

5.0.1 Statistically Significant Differences

Age (p = < 0.001):
- Patients with cancelled procedures had higher age (median 76 vs 64 years)
- IQR for cancelled: 66-79 vs completed: 52-73
- This suggests older age is associated with procedure cancellation
History of recurrent UTIs (p = 0.004):
- Higher proportion in the cancelled group (21.1% vs 8.5%)
- This indicates recurrent UTIs may increase risk of procedure cancellation
POP-Q stage (p = 0.022):
- Different distribution of prolapse stages between groups
- Stage 0 (no prolapse): 53.7% (cancelled) vs 36% (completed)
- Stage 2 (moderate): 11.1% (cancelled) vs 28.4% (completed)
- Stage 3-4 (advanced): 27.8% (cancelled) vs 23.3% (completed)
- This suggests pelvic organ prolapse stage distribution differs between groups
Menopause status (p = 0.015):
- Higher proportion in the cancelled group (91.2% vs 74.1%)
- This indicates post-menopausal status is associated with procedure cancellation

5.0.2 Variables Not Reaching Statistical Significance (P = .05 to .10)

BMI (p = 0.091):
- Patients with cancelled procedures had higher bmi (median 31 vs 28.5 kg/m²)
- IQR for cancelled: 25.6-35.2 vs completed: 24.8-33.5
- This suggests higher BMI may be associated with procedure cancellation
Diabetes (p = 0.053):
- Higher proportion in the cancelled group (24.6% vs 14.2%)
- This indicates diabetes may be a risk factor for procedure cancellation

Why are variables with P = .05 to .10 retained in the analysis?

These “trending” variables are kept for several important reasons:

LASSO handles variable selection: Unlike traditional stepwise regression that uses p-values for variable selection, LASSO uses cross-validation to select variables based on their contribution to predictive accuracy. A variable that does not reach statistical significance in univariate analysis may still improve model prediction when combined with other variables.
P-values are not selection criteria for LASSO: The p-values in Table 1 describe univariate associations (each variable tested alone). LASSO considers multivariate relationships and can identify variables that are predictive in combination, even if individually they show weaker associations.
Clinical plausibility: BMI and recurrent UTIs have biological rationale for affecting UTI risk and procedure cancellation. Excluding clinically relevant variables prematurely could result in an underspecified model.
Small sample size considerations: With only 57 events (cancellations), statistical power is limited. P-values near but above 0.05 may suggest meaningful effects that did not reach significance due to sample size constraints rather than true absence of association.
Let the data decide: By including these variables as candidates, we allow the LASSO algorithm to make an objective, data-driven decision about their inclusion based on cross-validated prediction error—not arbitrary p-value cutoffs.

5.0.3 No Significant Differences

The following factors showed no significant association with procedure cancellation (P ≥ 0.10): - Hispanic/Latino - Race - Immunocompromised status - Tobacco use - Vaginal estrogen use - OAB status - Nighttime voiding frequency - Year - PFDI-20 total - POPDI-6 - CRADI-8

5.0.4 Clinical Interpretation

The data suggests that Age, Recurrent UTIs, POP-Q stage, Menopause status are statistically significant predictors of procedure cancellation (P < 0.05). Additionally, BMI, Diabetes, UDI-6 showed differences between groups but did not reach the prespecified significance threshold (0.05 ≤ P < 0.10); however, these variables may warrant clinical consideration as potential risk factors.

The sample size for cancellations is relatively small (N=57 vs N=784), which may limit statistical power to detect some associations. The significant age difference suggests pre-procedure assessment might need to be more comprehensive for elderly patients to reduce cancellation rates.

5.1 TRIPOD Flow Diagram

The following flow diagram follows the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guidelines for reporting patient flow in prediction model studies.

# Create TRIPOD-compliant flow diagram using ggplot2
library(ggplot2)

# Define box positions and sizes
box_width <- 3.5
box_height <- 0.8

# Create data for boxes using PROGRAMMATIC values from tripod_labels
boxes <- data.frame(
  id = c("source", "initial", "handled1", "excl2", "excl3", "eligible", "missing", "final", "outcome_yes", "outcome_no"),
  x = c(5, 5, 9, 9, 9, 5, 9, 5, 3, 7),
  y = c(10, 8.5, 8.5, 7, 5.5, 6.5, 6.5, 5, 3, 3),
  label = c(
    tripod_labels$source,
    tripod_labels$initial,
    tripod_labels$handled_missing,  # Changed from excl_missing - now informational (retained)
    tripod_labels$excl_noshow,
    tripod_labels$excl_other,
    tripod_labels$eligible,
    tripod_labels$imputed,
    tripod_labels$final,
    tripod_labels$cancelled,
    tripod_labels$completed
  ),
  fill = c("#3498DB", "#E8F6F3", "#D5F5E3", "#FADBD8", "#FADBD8", "#E8F6F3", "#FEF9E7", "#27AE60", "#E74C3C", "#2ECC71")
  # Note: handled_missing uses green (#D5F5E3) instead of red since patients are retained
)

# Create data for arrows
arrows <- data.frame(
  x = c(5, 5, 5, 5, 5, 5, 5, 5),
  xend = c(5, 7.2, 7.2, 7.2, 5, 7.2, 3.5, 6.5),
  y = c(9.6, 8.5, 7.8, 6.3, 8.1, 6.5, 4.6, 4.6),
  yend = c(8.9, 8.5, 7, 5.5, 6.9, 6.5, 3.4, 3.4)
)

# Create the plot
tripod_plot <- ggplot() +
  # Draw boxes
  geom_rect(data = boxes,
            aes(xmin = x - box_width/2, xmax = x + box_width/2,
                ymin = y - box_height/2, ymax = y + box_height/2,
                fill = fill),
            color = "black", linewidth = 1) +
  # Add labels
  geom_text(data = boxes,
            aes(x = x, y = y, label = label),
            size = 3.5, fontface = "bold", lineheight = 0.9) +
  # Draw arrows (main flow)
  geom_segment(aes(x = 5, y = 9.6, xend = 5, yend = 8.9),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 1) +
  geom_segment(aes(x = 5, y = 8.1, xend = 5, yend = 6.9),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 1) +
  geom_segment(aes(x = 5, y = 6.1, xend = 5, yend = 5.4),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 1) +
  geom_segment(aes(x = 5, y = 4.6, xend = 3.5, yend = 3.4),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 1) +
  geom_segment(aes(x = 5, y = 4.6, xend = 6.5, yend = 3.4),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 1) +
  # Draw arrows (exclusions)
  geom_segment(aes(x = 6.8, y = 8.5, xend = 7.2, yend = 8.5),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 0.8, linetype = "dashed") +
  geom_segment(aes(x = 6.8, y = 7.8, xend = 7.2, yend = 7),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 0.8, linetype = "dashed") +
  geom_segment(aes(x = 6.8, y = 6.3, xend = 7.2, yend = 5.5),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 0.8, linetype = "dashed") +
  # Arrow for missing data
  geom_segment(aes(x = 6.8, y = 6.5, xend = 7.2, yend = 6.5),
               arrow = arrow(length = unit(0.3, "cm")), linewidth = 0.8, linetype = "dotted") +
  # Use identity for fill colors
  scale_fill_identity() +
  # Theme and labels
  labs(title = "TRIPOD Flow Diagram: Patient Selection for Prediction Model Development",
       subtitle = "Urodynamic Procedure Cancellation Due to UTI") +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray40"),
    plot.margin = margin(20, 20, 20, 20)
  ) +
  coord_cartesian(xlim = c(0, 12), ylim = c(2, 11))

print(tripod_plot)

Figure: TRIPOD Flow Diagram - Patient Selection and Analysis Flow

Flow Diagram Summary:

Data Source: REDCap urodynamics database (study ID: 230119)
Initial records: 932 patients in database
Missing status handling: 38 patients with missing cancellation status were set to “Completed” (assumed procedure completed)
Exclusions:
- No-show (did not attend): 53 patients
- Cancellation for other reasons (not UTI): 0 patients
Final cohort: 841 patients
- Completed procedures: 784 (93.2%)
- Cancelled due to UTI: 57 (6.8%)
Missing predictor data: Imputation for 15 patients with incomplete predictor data

Table 1. Demographic and Clinical Characteristics Stratified by Procedure Cancellation Status

Demographics Stratified by Procedure Cancellation Status
	Completed (N=784)	Cancelled (N=57)	Total (N=841)	p value
Age (years)				< 0.001 (1)
- Median	64.0	76.0	65.0
- Q1, Q3	52.0, 73.0	66.0, 79.0	53.0, 73.0
Body Mass Index (kg/m²)				0.090 (1)
- Median	28.5	31.0	28.6
- Q1, Q3	24.8, 33.5	25.6, 35.2	24.8, 33.7
Race:				0.665 (2)
- American Indian or Alaska Native	1 (0.1%)	0 (0.0%)	1 (0.1%)
- Asian	12 (1.5%)	0 (0.0%)	12 (1.4%)
- Black or African American	36 (4.6%)	1 (1.8%)	37 (4.4%)
- White	630 (80.4%)	46 (80.7%)	676 (80.4%)
- Other	100 (12.8%)	9 (15.8%)	109 (13.0%)
- Native Hawaiian or Other Pacific Islander	5 (0.6%)	1 (1.8%)	6 (0.7%)
Is the patient hispanic, latino or of Spanish origin?				0.313 (2)
- No	695 (88.6%)	48 (84.2%)	743 (88.3%)
- Yes	89 (11.4%)	9 (15.8%)	98 (11.7%)
Does the patient have diabetes?				0.033 (2)
- Yes	111 (14.2%)	14 (24.6%)	125 (14.9%)
- No	673 (85.8%)	43 (75.4%)	716 (85.1%)
History of Recurrent Urinary Tract Infections				0.002 (2)
- Yes	67 (8.5%)	12 (21.1%)	79 (9.4%)
- No	717 (91.5%)	45 (78.9%)	762 (90.6%)
Immunocompromised:				0.734 (2)
- Yes	35 (4.5%)	2 (3.5%)	37 (4.4%)
- No	749 (95.5%)	55 (96.5%)	804 (95.6%)
Tobacco use:				0.374 (2)
- Yes	284 (36.2%)	24 (42.1%)	308 (36.6%)
- No	500 (63.8%)	33 (57.9%)	533 (63.4%)
Menopause status:				0.015 (2)
- Pre-menopausal	167 (21.3%)	4 (7.0%)	171 (20.3%)
- Post-menopausal	581 (74.1%)	52 (91.2%)	633 (75.3%)
- Unclear	36 (4.6%)	1 (1.8%)	37 (4.4%)
Is the patient on vaginal estrogen?				0.238 (2)
- Yes	119 (15.2%)	12 (21.1%)	131 (15.6%)
- No	665 (84.8%)	45 (78.9%)	710 (84.4%)
Does the patient have Overactive Bladder?				0.112 (2)
- Yes	396 (50.5%)	35 (61.4%)	431 (51.2%)
- No	388 (49.5%)	22 (38.6%)	410 (48.8%)
Average number of voids at night:				0.384 (1)
- Median	2.0	2.0	2.0
- Q1, Q3	1.0, 3.0	1.0, 3.0	1.0, 3.0
Pelvic Organ Prolapse Quantification Stage				0.022 (2)
- 0	282 (36.0%)	29 (53.7%)	311 (37.1%)
- 1	96 (12.2%)	4 (7.4%)	100 (11.9%)
- 2	223 (28.4%)	6 (11.1%)	229 (27.3%)
- 3	151 (19.3%)	13 (24.1%)	164 (19.6%)
- 4	32 (4.1%)	2 (3.7%)	34 (4.1%)
Pelvic Floor Distress Inventory-20 Total Score				0.387 (1)
- Median	100.0	100.0	100.0
- Q1, Q3	95.8, 100.2	100.0, 109.0	98.0, 101.0
Pelvic Organ Prolapse Distress Inventory-6				0.606 (1)
- Median	29.2	29.2	29.2
- Q1, Q3	25.0, 29.2	29.2, 33.3	25.0, 29.2
Colorectal-Anal Distress Inventory-8				0.738 (1)
- Median	18.8	18.8	18.8
- Q1, Q3	18.8, 21.9	18.8, 25.0	18.8, 21.9
Urinary Distress Inventory-6				0.170 (1)
- Median	50.0	50.0	50.0
- Q1, Q3	45.8, 50.0	50.0, 54.2	50.0, 50.0
Year				0.951 (2)
- 2022	288 (36.7%)	22 (38.6%)	310 (36.9%)
- 2023	332 (42.3%)	23 (40.4%)	355 (42.2%)
- 2024	164 (20.9%)	12 (21.1%)	176 (20.9%)
What was the reason for the urodynamics?				0.331 (2)
- evaluation of mixed incontinence	135 (17.2%)	9 (15.8%)	144 (17.1%)
- evaluation of neurogenic bladder	1 (0.1%)	0 (0.0%)	1 (0.1%)
- evaluation of OAB	179 (22.8%)	21 (36.8%)	200 (23.8%)
- evaluation of voiding dysfunction	19 (2.4%)	2 (3.5%)	21 (2.5%)
- other	33 (4.2%)	2 (3.5%)	35 (4.2%)
- pre-op for prolapse surgery	279 (35.6%)	17 (29.8%)	296 (35.2%)
- pre-op for sling	138 (17.6%)	6 (10.5%)	144 (17.1%)

Linear Model ANOVA
Pearson’s Chi-squared test

5.1.1 Remove spaces from column names

# Rename columns to remove spaces and special characters
colnames(labels_df) <- make.names(colnames(labels_df))

# Remove Record.ID if it still exists after make.names conversion
# This is a non-contributory identifier variable
if ("Record.ID" %in% names(labels_df)) {
  labels_df <- labels_df %>% select(-Record.ID)
  message("✅ Removed Record.ID from labels_df")
}

5.1.2 Ensure the outcome variable is a factor

5.1.3 Model

6 Feature Selection

This section uses LASSO (Least Absolute Shrinkage and Selection Operator) for objective, data-driven variable selection. Unlike traditional p-value based selection, LASSO uses regularization to automatically identify the most predictive variables while guarding against overfitting.

Why LASSO instead of p-value selection?

LASSO may select variables that do not have p < 0.10 in univariate analysis, and may exclude variables that do. This occurs because:

Multicollinearity handling: LASSO selects one variable from correlated groups (e.g., Menopause Status was excluded because Age is preferred for clinical interpretability; they correlate at r=0.77)
Predictive vs. associative: A variable can be statistically significant but add little predictive information once other variables are in the model
Regularization: LASSO penalizes model complexity, favoring parsimonious models that generalize better

The variables shown in the “LASSO Model Selected Features” table below are the final predictors used in the nomogram, regardless of their univariate p-values.

6.0.1 Variable Screening

Before LASSO feature selection, variables were screened for data quality issues. Variables were excluded if they had: (1) near-zero variance, (2) >90% missing data, or (3) correlation >0.90 with another predictor.

Variables screened: 20 | Excluded: 1 | Retained for LASSO: 18

Variable	Exclusion Reason
Immunocompromised.	Near-zero variance

7 Feature Selection with LASSO

LASSO (Least Absolute Shrinkage and Selection Operator) is a regularization method that automatically selects the most important predictors by shrinking less important coefficients to exactly zero. We used 10-fold cross-validation to identify the optimal regularization strength (λ) that minimizes prediction error while producing a parsimonious model.

# Create a data frame with the cross-validation results
cv_results <- data.frame(
  lambda = cv_lasso$lambda,
  mean_cross_validated_error = cv_lasso$cvm,           # Mean cross-validated error
  standard_error_of_cross_validation = cv_lasso$cvsd,         # Standard error of CV error
  upper_bound_for_cross_validation_error = cv_lasso$cvup,         # Upper bound for CV error
  lower_bound_for_cross_validation_error = cv_lasso$cvlo          # Lower bound for CV error
)

# Get the optimal lambda value first
optimal_lambda <- cv_lasso$lambda.min

# Find the row index of the optimal lambda
optimal_row <- which.min(abs(cv_results$lambda - optimal_lambda))

# Display actual cross-validation results from our data
# NOTE: head(20) is for DISPLAY ONLY - showing all lambda values would be unwieldy.
# The full CV results are used for model selection.
cv_results_display <- cv_results %>%
  head(20) %>%
  mutate(
    lambda = round(lambda, 6),
    mean_cross_validated_error = round(mean_cross_validated_error, 4),
    standard_error_of_cross_validation = round(standard_error_of_cross_validation, 4),
    upper_bound_for_cross_validation_error = round(upper_bound_for_cross_validation_error, 4),
    lower_bound_for_cross_validation_error = round(lower_bound_for_cross_validation_error, 4)
  ) %>%
  rename(
    `λ (Lambda)` = lambda,
    `CV Error` = mean_cross_validated_error,
    `Std Error` = standard_error_of_cross_validation,
    `Upper Bound` = upper_bound_for_cross_validation_error,
    `Lower Bound` = lower_bound_for_cross_validation_error
  )

# Create flextable with highlighted optimal row
cv_results_display %>%
  flextable() %>%
  set_caption(paste0("LASSO Cross-Validation Results from Our Urodynamics Data (Optimal λ = ",
                     round(optimal_lambda, 6), ")")) %>%
  theme_vanilla() %>%
  autofit() %>%
  bold(part = "header") %>%
  fontsize(size = 9, part = "all") %>%
  bg(i = optimal_row, bg = "#90EE90") %>%  # Highlight optimal lambda row in green
  bold(i = optimal_row) %>%
  add_footer_lines(paste0("Green highlighted row shows the optimal λ (lambda.min = ",
                          round(optimal_lambda, 6),
                          ") that minimizes cross-validated error.")) %>%
  fontsize(size = 8, part = "footer") %>%
  italic(part = "footer")

LASSO Cross-Validation Results from Our Urodynamics Data (Optimal λ = 0.009669)
λ (Lambda)	CV Error	Std Error	Upper Bound	Lower Bound
0.047015	0.4807	0.0541	0.5348	0.4266
0.042838	0.4764	0.0543	0.5307	0.4222
0.039033	0.4711	0.0538	0.5248	0.4173
0.035565	0.4663	0.0533	0.5196	0.4130
0.032406	0.4623	0.0529	0.5152	0.4094
0.029527	0.4591	0.0527	0.5118	0.4064
0.026904	0.4565	0.0525	0.5090	0.4040
0.024514	0.4544	0.0524	0.5068	0.4020
0.022336	0.4525	0.0524	0.5049	0.4001
0.020352	0.4505	0.0525	0.5030	0.3979
0.018544	0.4483	0.0527	0.5010	0.3955
0.016896	0.4460	0.0529	0.4988	0.3931
0.015395	0.4439	0.0531	0.4970	0.3909
0.014028	0.4424	0.0533	0.4957	0.3891
0.012781	0.4410	0.0535	0.4945	0.3876
0.011646	0.4401	0.0536	0.4937	0.3865
0.010611	0.4393	0.0538	0.4932	0.3855
0.009669	0.4389	0.0541	0.4929	0.3848
0.008810	0.4389	0.0543	0.4932	0.3846
0.008027	0.4393	0.0545	0.4937	0.3848
Green highlighted row shows the optimal λ (lambda.min = 0.009669) that minimizes cross-validated error.

7.0.1 Actual Cross-Validation Results from Our Data

These are the ACTUAL lambda values from our urodynamics data analysis (unlike the simulated values in the conceptual illustration above). The optimal λ value of 0.0097 was determined through 10-fold cross-validation on our actual dataset.

Cross-validation results from our urodynamics data showing optimal lambda selection

Interpretation of Our Cross-Validation Results:

The blue line shows how prediction error changes as we increase regularization (λ)
The shaded blue region represents uncertainty (±1 standard error)
The red dashed line marks our selected λ (0.009669) - the value that minimizes error
The green dashed line marks λ.1se (0.047015) - a more conservative choice

kable(selected_features_df,
      caption = "LASSO Model Selected Features and Coefficients",
      col.names = c("Feature", "Coefficient"),  # Match the actual column names
      align = c("l", "r"),                      # Left-align text, right-align numbers
      booktabs = TRUE,
      linesep = "",
      format = "html") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center"
  ) %>%
  row_spec(0, bold = TRUE, color = "black", background = "#f2f2f2") %>%
  column_spec(2, width = "100px")  # Set width for the coefficient column

LASSO Model Selected Features and Coefficients
Feature	Coefficient
Age.	0.0511055
BMI	0.0096123
Is.the.patient.hispanic..latino.or.of.Spanish.origin.Yes	0.1817979
Does.the.patient.have.a.h.o.recurrent.UTIs.Yes	0.5613181
POP.Q.stage.2	-0.4745999
What.was.the.reason.for.the.urodynamics.evaluation of OAB	0.2585838

8 LASSO Regularization Path

# Create the regularization path plot
ggplot(lasso_filtered, aes(x = log10(Lambda), y = Coefficient, color = Feature)) +
  geom_line(linewidth = 1.2) +  # Ensure lines are drawn
  geom_vline(xintercept = log10(optimal_lambda), linetype = "dashed", color = "red", linewidth = 1.2) +
  labs(
    title = "LASSO Regularization Path",
    subtitle = "Red line indicates optimal lambda from cross-validation",
    x = expression(log[10](lambda)),
    y = "Coefficient Values"
  ) +
  theme_minimal(base_size = 14) +  
  theme(
    legend.position = "right",  # Changed to show legend for feature identification
    axis.title = element_text(face = "bold"),
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12, color = "gray40")
  )

$Figure: LASSO Regularization Path showing how coefficient values change with lambda$

Figure: LASSO Regularization Path showing how coefficient values change with lambda

8.1 Interactive LASSO Regularization Path Visualization

library(plotly)

# Create ggplot first
p_base <- ggplot(lasso_filtered, aes(x = log_lambda, y = Coefficient, color = Feature)) +
  geom_line(linewidth = 0.8) +
  geom_vline(xintercept = log10(optimal_lambda), linetype = "dashed",
             color = "red", linewidth = 1) +
  labs(
    title = "Interactive LASSO Regularization Path",
    subtitle = "Hover over lines to identify features; click legend to toggle",
    x = "log₁₀(λ)",
    y = "Coefficient Value"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    legend.position = "right",
    axis.title = element_text(face = "bold"),
    plot.title = element_text(face = "bold", size = 14)
  ) +
  scale_color_viridis_d(option = "turbo")

# Convert to plotly for interactivity
p_interactive <- ggplotly(p_base, tooltip = c("color", "x", "y")) %>%
  layout(
    title = list(
      text = paste0("Interactive LASSO Regularization Path<br>",
                    "<sup>Hover to identify features | Click legend to toggle | ",
                    "Optimal λ = ", round(cv_lasso$lambda.min, 4), "</sup>"),
      font = list(size = 14)
    ),
    legend = list(
      orientation = "v",
      x = 1.02,
      y = 0.5,
      font = list(size = 9)
    ),
    hovermode = "closest"
  ) %>%
  # Add annotation for optimal lambda
  add_annotations(
    x = log10(optimal_lambda),
    y = max(lasso_filtered$Coefficient),
    text = paste0("λ.min = ", round(cv_lasso$lambda.min, 4)),
    showarrow = TRUE,
    arrowhead = 2,
    arrowsize = 0.5,
    ax = 40,
    ay = -30,
    font = list(color = "red", size = 10)
  )

p_interactive

Version 3: Interactive plotly visualization - hover over lines to see feature names

This interactive visualization allows you to hover over any line to see the feature name and coefficient value, click legend entries to show/hide specific features, and zoom/pan to explore specific regions.

Figure Interpretation: LASSO Regularization Path

This graph shows the LASSO regularization path for a logistic regression model, illustrating how coefficient values change as the regularization strength (lambda) varies.

Key Observations:

Regularization Effect: As lambda increases (moving left to right on the x-axis), more coefficients are pushed toward zero. This demonstrates LASSO’s feature selection capability.
Optimal Lambda: The vertical red line indicates the optimal lambda value (λ = 0.0097) determined through cross-validation. This represents the best balance between model complexity and predictive performance.
Selected Predictors at Optimal Lambda: LASSO selected 6 feature(s) with non-zero coefficients:
- Age. (coefficient: 0.0511)
- BMI (coefficient: 0.0096)
- Is.the.patient.hispanic..latino.or.of.Spanish.origin.Yes (coefficient: 0.1818)
- Does.the.patient.have.a.h.o.recurrent.UTIs.Yes (coefficient: 0.5613)
- POP.Q.stage.2 (coefficient: -0.4746)
- What.was.the.reason.for.the.urodynamics.evaluation of OAB (coefficient: 0.2586)
Variables Eliminated: All other features have coefficients shrunk to exactly zero at the optimal lambda, indicating they contribute little predictive value after accounting for the selected variable(s).
Direction of Effects:
- Features with positive coefficients (above zero) increase the probability of procedure cancellation
- Features with negative coefficients (below zero) decrease the probability of procedure cancellation

9 Cross-Validation Results

Figure Interpretation: Cross-Validation Plot

This figure shows the results of cross-validation for the LASSO logistic regression model, illustrating how model performance varies with different regularization strengths.

9.1 Key Elements of the Plot:

X-axis: Shows the log of lambda (Log(λ)) values, ranging from approximately -10 to -3. Lambda is the regularization parameter that controls the strength of the penalty in LASSO regression.
Y-axis: Shows the Binomial Deviance, which is the measure of error for your logistic regression model. Lower values indicate better model performance.
Red line: Represents the mean cross-validated error across the folds for each lambda value.
Gray error bars: Represent the standard error of the cross-validated error estimate.
Numbers at top: Indicate how many features (variables) are retained in the model at each lambda value. As lambda increases (moving right), fewer variables are kept in the model (from 30 variables at the lowest lambda to 0 at the highest).
Vertical dotted lines:
- The left vertical line marks lambda.min (the value of lambda that gives minimum mean cross-validated error)
- The right vertical line marks lambda.1se (the largest lambda value within one standard error of the minimum)

9.2 Interpretation:

The model achieves its lowest error (around 0.44) at a log lambda of approximately -4.6, where 6 variables are retained in the model.
The lambda.1se recommendation (right dotted line at approximately -3.1) suggests using a more parsimonious model with only 0 features, which is still within one standard error of the minimum error. This is often preferred as it’s simpler while maintaining comparable performance.
The plot demonstrates the tradeoff between model complexity and performance: as you move from left to right, the model becomes simpler (fewer variables) while the error generally increases.
This cross-validation plot helps you objectively select the optimal regularization parameter for your final LASSO model predicting procedure cancellation.

Based on this graph, your final model uses 6 features (for minimum error), which include the predictors shown in the LASSO Model Selected Features table above.

9.2.1 Understanding the LASSO Selection Process

The table above shows the predictors LASSO retained with non-zero coefficients at the optimal lambda. The coefficient magnitudes vary substantially based on the variable type and sample sizes in each subgroup.

Key Considerations for Predictor Selection:

Initial LASSO output: The cross-validation process identified the optimal λ, retaining predictors that contribute meaningfully to prediction after accounting for the regularization penalty.
Coefficient magnitude interpretation: Categorical variables may show larger coefficients for specific subgroups, but these represent smaller sample sizes. Continuous predictors have per-unit effects that apply across the entire patient population.
Final model selection: The prediction model uses the predictors identified by LASSO at the optimal lambda. This approach ensures:
- Objective variable selection through cross-validated regularization
- Clinical interpretability of the final model
- Protection against overfitting by penalizing less important predictors
- Reproducibility with data-driven variable selection
Model performance: The final model balances discrimination ability with parsimony to ensure generalizability for clinical use.

10 Model

extract_discrimination_metrics(model_output = model, verbose = TRUE)

11 Model Performance Interpretation

The table above shows key metrics evaluating how well the model predicts procedure cancellation.

Key Metrics:

C-statistic (0.774): The probability that the model correctly ranks a cancelled case higher than a completed case. Values >0.7 indicate good discrimination.
Brier Score (0.055): Measures prediction accuracy (0 = perfect, 1 = worst). Lower is better.
R² (0.167): The proportion of variance explained. Modest R² values are typical for rare outcomes like procedure cancellation (6.8% rate).

Note on Class Imbalance: With only 57 cancellations out of 841 procedures, we use an optimized probability threshold (0.07) rather than the default 0.5 to balance sensitivity and specificity.

11.0.1 Brier Score Plot

This calibration plot visualizes how well the predicted probabilities from your logistic regression model align with the actual observed outcomes. The x-axis (“Predicted Probability”) represents the probabilities that the model estimated, while the y-axis (“Observed Probability of Cancellation”) indicates how frequently cancellations occurred within each predicted probability bin. Each point (blue circle) corresponds to a group of observations with similar predicted probabilities, and the size of each circle reflects the number of observations in that group, as detailed in the legend. Vertical lines around each point represent the 95% confidence intervals, indicating the uncertainty of each observed proportion.

A dashed grey diagonal line shows perfect calibration (where predictions exactly match observations). Your model shows good overall calibration, as evidenced by points closely following this ideal diagonal line, particularly for predicted probabilities below approximately 0.3. However, at higher probabilities (around 0.3 to 0.4), there is more deviation from perfect calibration, suggesting predictions in this range might be less reliable due to fewer observations or greater variability. The Brier score (0.0601) supports this interpretation, indicating good predictive accuracy overall, with only minor discrepancies observed at higher predicted probabilities.

# Create the calibration plot using the function
calibration_plot <- create_calibration_plot(model)

# Display the plot
print(calibration_plot)

The Brier Score and Area Under the Receiver Operating Characteristic Curve (AUC) are both valuable metrics used to assess model performance, though they measure different aspects of predictive accuracy. The Brier Score evaluates a model’s overall accuracy by simultaneously assessing calibration—how closely the predicted probabilities match observed outcomes—and its discriminatory power, or the ability to distinguish between different outcome classes. It ranges from 0 to 1, where lower values indicate better performance. A score of 0 represents perfect prediction accuracy, whereas higher scores indicate poorer performance, with scores closer to 0.25 typically suggesting no better performance than random chance in a balanced binary classification. Conversely, the AUC specifically quantifies a model’s ability to discriminate between two outcome classes. It ranges from 0.5 to 1.0, with 0.5 indicating random guessing (no discrimination ability) and 1.0 indicating perfect discrimination. For instance, an AUC of 0.85 demonstrates strong ability to correctly distinguish between classes based on predicted probabilities, but it does not guarantee that these predicted probabilities accurately reflect the actual observed frequencies. Thus, while the Brier Score combines both calibration (the accuracy of predicted probabilities) and discrimination (the ability to distinguish between outcomes), the AUC exclusively measures discrimination.

11.0.2 AUC Plot

roc_results <- plot_roc_curve(
   input_data = selected_labels_df,
   outcome_variable = "Was.the.procedure.cancelled.",
   prediction_model = model,
   prediction_type = "response",
   #optimal_threshold = TRUE,
   confidence_intervals = TRUE,
   ci_method = "bootstrap",
   #custom_title = NULL,
   save_path = NULL,
   verbose = TRUE
 )

print(roc_results)

$roc_object

Call: roc.default(response = actual_outcome_values, predictor = prediction_values)

Data: prediction_values in 782 controls (actual_outcome_values 0) > 53 cases (actual_outcome_values 1). Area under the curve: 0.5353

$plot_object

The values in your ROC curve represent important metrics related to the performance of your predictive model:

Interpreting ROC Results (Note: Values calculated below in the optimal threshold section)

The ROC curve analysis helps evaluate model performance at different classification thresholds. Key concepts:

Sensitivity (True Positive Rate) - The proportion of actual cancellations correctly identified by the model - Higher sensitivity means fewer missed cancellations

Specificity (True Negative Rate) - The proportion of completed procedures correctly identified by the model - Higher specificity means fewer false alarms

What this means for your model: - The AUC of 0.774 indicates a moderately good model (values range from 0.5 for random guessing to 1.0 for perfect prediction) - At the optimal threshold (calculated below), the model balances sensitivity and specificity - The specific sensitivity and specificity values at the optimal threshold are reported in the subsequent sections

Depending on your clinical goals, you might want to adjust this threshold: - If missing a cancellation is very costly, use a lower threshold to increase sensitivity - If falsely predicting cancellations is problematic, use a higher threshold to increase specificity

11.0.3 Confusion Matrix

# Calculate prediction range statistics for programmatic reporting (needed for text below)
pred_min <- round(min(predictions, na.rm = TRUE), 2)
pred_max <- round(max(predictions, na.rm = TRUE), 2)
pred_q95 <- round(quantile(predictions, 0.95, na.rm = TRUE), 2)

# Histogram of predicted probabilities
hist(predictions,
     breaks = 30,
     main = "Distribution of Predicted Cancellation Probabilities",
     xlab = "Predicted Probability of Cancellation",
     ylab = "Number of Patients",
     col = "steelblue",
     border = "white")
abline(v = optimal_threshold, col = "red", lwd = 2, lty = 2)
legend("topright", legend = c(paste0("Optimal Threshold (", optimal_threshold_display, ")")), col = "red", lty = 2, lwd = 2)

Distribution of predicted probabilities

11.0.4 Interpreting the Histogram of Predictions

The histogram above shows the distribution of predicted cancellation probabilities across all 841 patients in the dataset. Key observations:

Right-skewed distribution: Most patients have low predicted probabilities (concentrated below 0.21, the 95th percentile), which reflects the low baseline cancellation rate (6.8%) in the population.
Peak near the threshold: The distribution peaks near the optimal threshold (0.07), indicating most patients are predicted to have relatively low risk of cancellation. This is expected given the 93.2% completion rate.
Long right tail: A smaller number of patients extend to higher probabilities (up to 0.65), representing high-risk individuals (typically older patients with history of recurrent UTIs).
Red dashed line: The optimal classification threshold (0.07) is shown. Patients with predicted probabilities above this line are classified as “likely to cancel.”

Why is the threshold so low? The threshold of 0.07 (7%) may seem low, but it reflects the base rate of cancellation in the data. Because cancellations are rare, the optimal threshold that balances sensitivity and specificity is near the population prevalence. Using the standard 0.5 threshold would result in classifying virtually no patients as “at risk.”

Optimal Threshold Interpretation (Programmatic): The optimal classification threshold of 0.0244637 (2.4%) was determined by maximizing Youden’s J statistic (sensitivity + specificity - 1), which identifies the probability cutoff that best balances the ability to detect true cancellations (sensitivity) while minimizing false alarms (1 - specificity). This threshold is close to the base cancellation rate of 6.8%, which is expected when the outcome is rare—the optimal threshold gravitates toward the prevalence to avoid systematic over- or under-prediction.

# 5. Print the confusion matrix as a formatted table
conf_df <- as.data.frame.matrix(conf_matrix)
conf_df <- cbind(Actual = rownames(conf_df), conf_df)
rownames(conf_df) <- NULL

# Display using flextable for better formatting
conf_ft <- flextable(conf_df) %>%
  set_caption("Confusion Matrix: Actual vs. Predicted Classifications") %>%
  set_header_labels(Actual = "Actual Outcome") %>%
  add_header_row(values = c("", "Predicted"), colwidths = c(1, 2)) %>%
  theme_vanilla() %>%
  autofit() %>%
  align(align = "center", part = "all") %>%
  bold(part = "header") %>%
  bg(i = 1, j = 2, bg = "#d4edda") %>%  # True negatives - green

  bg(i = 2, j = 3, bg = "#d4edda") %>%  # True positives - green
  bg(i = 1, j = 3, bg = "#f8d7da") %>%  # False positives - red
  bg(i = 2, j = 2, bg = "#f8d7da")      # False negatives - red

conf_ft

Confusion Matrix: Actual vs. Predicted Classifications
	Predicted
Actual Outcome	Cancelled	Completed
Cancelled	28	25
Completed	496	286

# 6. Calculate and display metrics
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
sensitivity <- conf_matrix["Cancelled", "Cancelled"] / sum(conf_matrix["Cancelled", ])
specificity <- conf_matrix["Completed", "Completed"] / sum(conf_matrix["Completed", ])

# Create metrics table
metrics_df <- data.frame(
  Metric = c("Accuracy", "Sensitivity (Recall)", "Specificity"),
  Value = c(paste0(round(accuracy * 100, 1), "%"),
            paste0(round(sensitivity * 100, 1), "%"),
            paste0(round(specificity * 100, 1), "%")),
  Interpretation = c(
    "Overall correct classification rate",
    "Proportion of actual cancellations correctly identified",
    "Proportion of completed procedures correctly identified"
  )
)

flextable(metrics_df) %>%
  set_caption(paste0("Classification Performance Metrics at Optimal Threshold (", optimal_threshold_display, ")")) %>%
  theme_vanilla() %>%
  autofit() %>%
  bold(part = "header")

Classification Performance Metrics at Optimal Threshold (0.024)
Metric	Value	Interpretation
Accuracy	37.6%	Overall correct classification rate
Sensitivity (Recall)	52.8%	Proportion of actual cancellations correctly identified
Specificity	36.6%	Proportion of completed procedures correctly identified

11.0.5 Interpreting the Confusion Matrix

The confusion matrix shows how the model’s predictions compare to actual outcomes. At the optimal threshold of 0.024:

	Predicted: Completed	Predicted: Cancelled
Actually Completed	True Negatives (TN)	False Positives (FP)
Actually Cancelled	False Negatives (FN)	True Positives (TP)

Reading the results:

True Positives (green): Patients correctly predicted to have cancellations
True Negatives (green): Patients correctly predicted to complete their procedures
False Positives (red): Patients predicted to cancel but who actually completed (false alarms)
False Negatives (red): Patients predicted to complete but who actually cancelled (missed detections)

Trade-off at the chosen threshold:

The optimal threshold (0.024) balances sensitivity and specificity. However, clinical considerations may suggest adjusting this threshold:

If missing a cancellation is costly (wasted OR time, patient inconvenience): Use a lower threshold to increase sensitivity, accepting more false positives
If false alarms are problematic (unnecessary interventions, patient anxiety): Use a higher threshold to increase specificity, accepting more missed cancellations

Your classification model predicting whether a procedure would be cancelled achieved good overall accuracy. Given that cancellations represent a minority class (as indicated by your data, with 57 cancellations out of 841 observations), accuracy alone might not fully capture your model’s performance.

The sensitivity (recall) describes the model’s ability to correctly detect patients whose procedures actually were cancelled. A higher sensitivity means fewer missed cancellations. Your calculated sensitivity reflects how well the model identified those who truly experienced cancellations. Conversely, specificity represents the proportion of correctly identified non-cancelled procedures. A high specificity means your model avoids incorrectly labeling procedures as cancelled when they actually occurred as planned.

However, sensitivity and specificity alone don’t provide a full picture—particularly if cancellations are uncommon (as your data suggest, with cancellations being a minority class, around 6.8%). In these scenarios, positive predictive value (PPV) and negative predictive value (NPV) become essential.

A high PPV indicates that when the model predicts cancellation, it is likely correct. In practice, this would help prioritize interventions or preoperative counseling for patients most likely to cancel.
A high NPV would indicate reliability in identifying patients whose procedures likely will not be cancelled, allowing more confident scheduling without unnecessary resource allocation.

Given your results:

Accuracy: Your model correctly classified a good proportion of cases overall, though accuracy alone can be misleading if the dataset is imbalanced.
Sensitivity and specificity indicate reasonable discriminative ability (C-index = 0.774), suggesting your model effectively differentiates between cancellations and non-cancellations.
A Brier score of 0.055 indicates good overall calibration, suggesting your model’s predicted probabilities closely align with actual outcomes.

To fully interpret your results in clinical practice, calculate the PPV and NPV. High values in these metrics would further support your model’s clinical usefulness, making it easier for clinicians to act confidently based on predicted risk levels.

Overall, your model demonstrates good discrimination and calibration, making it useful for anticipating procedural cancellations and enabling targeted management or counseling, potentially reducing cancellations or enhancing operational efficiency.

12 Nomogram

# Open TIFF graphics device BEFORE plotting
tiff(
  filename = "/Users/tylermuffly/Dropbox (Personal)/Tunitsky/Data/nomogram_model.tiff",
  width = 12,
  height = 7,
  units = "in",
  res = 600
)

12.1 Technical Details: Setting Variable Limits for Nomogram Generation

The code below configures the data distribution settings required by the rms package for nomogram generation. The datadist object defines the range of values for Age and BMI that will be displayed on the nomogram axes.

# Load required packages
library(rms)
library(logger)

# Set up logging
log_info("Starting procedure cancellation nomogram creation")

# Ensure dataset meets requirements
if (!exists("selected_labels_df")) {
    stop("Error: selected_labels_df not found")
}

# ============================================================================
# CLEAN VARIABLE NAMES AND FACTOR LEVELS FOR NOMOGRAM DISPLAY
# This ensures the nomogram has readable labels instead of ugly R variable names
# ============================================================================

log_info("Cleaning variable names and factor levels for nomogram display")

# Create a copy for nomogram display
nomogram_df <- selected_labels_df

# COMPREHENSIVE VARIABLE NAME MAPPING (old name -> new clean name)
# Using underscores instead of spaces to avoid formula issues
var_name_map <- c(
  "Age." = "Age",
  "BMI" = "BMI",
  "Is.the.patient.hispanic..latino.or.of.Spanish.origin." = "Hispanic",
  "Is.the.patient.Hispanic.or.Latino." = "Hispanic",
  "Race." = "Race",
  "Does.the.patient.have.a.h.o.recurrent.UTIs." = "Recurrent_UTIs",
  "Does.the.patient.have.recurrent.UTIs." = "Recurrent_UTIs",
  "POP.Q.stage." = "POP_Q_Stage",
  "POP.Q.stage" = "POP_Q_Stage",
  "What.was.the.reason.for.the.urodynamics." = "UDS_Indication",
  "What.is.the.reason.for.the.urodynamics." = "UDS_Indication",
  "Does.the.patient.have.diabetes." = "Diabetes",
  "Does.the.patient.have.OAB." = "OAB",
  "Is.the.patient.on.vaginal.estrogen." = "Vaginal_Estrogen",
  "Immunocompromised." = "Immunocompromised",
  "Tobacco.use." = "Tobacco_Use",
  "Menopause.status." = "Menopause",
  "Average.number.of.voids.at.night." = "Nocturia",
  "Year" = "Year",
  "PFDI.20.Score" = "PFDI_20",
  "UDI_6" = "UDI_6",
  "POPDI_6" = "POPDI_6",
  "CRADI_8" = "CRADI_8"
)

# Rename columns that exist in the data
for (old_name in names(var_name_map)) {
  if (old_name %in% names(nomogram_df)) {
    new_name <- var_name_map[old_name]
    names(nomogram_df)[names(nomogram_df) == old_name] <- new_name
    log_info(paste("Renamed:", old_name, "->", new_name))
  }
}

# CLEAN FACTOR LEVELS - Replace abbreviations with readable labels
# Hispanic/Latino
if ("Hispanic" %in% names(nomogram_df)) {
  nomogram_df$Hispanic <- factor(nomogram_df$Hispanic)
  levels(nomogram_df$Hispanic) <- gsub("Yes|yes|Y|1|TRUE", "Yes", levels(nomogram_df$Hispanic))
  levels(nomogram_df$Hispanic) <- gsub("No|no|N|0|FALSE", "No", levels(nomogram_df$Hispanic))
}

# Recurrent UTIs
if ("Recurrent_UTIs" %in% names(nomogram_df)) {
  nomogram_df$Recurrent_UTIs <- factor(nomogram_df$Recurrent_UTIs)
  old_levels <- levels(nomogram_df$Recurrent_UTIs)
  new_levels <- ifelse(grepl("Yes|yes|Y|1|TRUE", old_levels), "Yes",
                       ifelse(grepl("No|no|N|0|FALSE", old_levels), "No", old_levels))
  levels(nomogram_df$Recurrent_UTIs) <- new_levels
}

# Diabetes
if ("Diabetes" %in% names(nomogram_df)) {
  nomogram_df$Diabetes <- factor(nomogram_df$Diabetes)
  old_levels <- levels(nomogram_df$Diabetes)
  new_levels <- ifelse(grepl("Yes|yes|Y|1|TRUE", old_levels), "Yes",
                       ifelse(grepl("No|no|N|0|FALSE", old_levels), "No", old_levels))
  levels(nomogram_df$Diabetes) <- new_levels
}

# OAB
if ("OAB" %in% names(nomogram_df)) {
  nomogram_df$OAB <- factor(nomogram_df$OAB)
  old_levels <- levels(nomogram_df$OAB)
  new_levels <- ifelse(grepl("Yes|yes|Y|1|TRUE", old_levels), "Yes",
                       ifelse(grepl("No|no|N|0|FALSE", old_levels), "No", old_levels))
  levels(nomogram_df$OAB) <- new_levels
}

# Vaginal Estrogen
if ("Vaginal_Estrogen" %in% names(nomogram_df)) {
  nomogram_df$Vaginal_Estrogen <- factor(nomogram_df$Vaginal_Estrogen)
  old_levels <- levels(nomogram_df$Vaginal_Estrogen)
  new_levels <- ifelse(grepl("Yes|yes|Y|1|TRUE", old_levels), "Yes",
                       ifelse(grepl("No|no|N|0|FALSE", old_levels), "No", old_levels))
  levels(nomogram_df$Vaginal_Estrogen) <- new_levels
}

# Immunocompromised
if ("Immunocompromised" %in% names(nomogram_df)) {
  nomogram_df$Immunocompromised <- factor(nomogram_df$Immunocompromised)
  old_levels <- levels(nomogram_df$Immunocompromised)
  new_levels <- ifelse(grepl("Yes|yes|Y|1|TRUE", old_levels), "Yes",
                       ifelse(grepl("No|no|N|0|FALSE", old_levels), "No", old_levels))
  levels(nomogram_df$Immunocompromised) <- new_levels
}

# Tobacco Use
if ("Tobacco_Use" %in% names(nomogram_df)) {
  nomogram_df$Tobacco_Use <- factor(nomogram_df$Tobacco_Use)
  old_levels <- levels(nomogram_df$Tobacco_Use)
  new_levels <- ifelse(grepl("Yes|yes|Y|1|TRUE|Current|Former", old_levels), "Yes",
                       ifelse(grepl("No|no|N|0|FALSE|Never", old_levels), "No", old_levels))
  levels(nomogram_df$Tobacco_Use) <- new_levels
}

# Menopause Status
if ("Menopause" %in% names(nomogram_df)) {
  nomogram_df$Menopause <- factor(nomogram_df$Menopause)
  old_levels <- levels(nomogram_df$Menopause)
  new_levels <- gsub("Post-menopausal|Postmenopausal|post", "Post", old_levels)
  new_levels <- gsub("Pre-menopausal|Premenopausal|pre", "Pre", new_levels)
  new_levels <- gsub("Peri-menopausal|Perimenopausal|peri", "Peri", new_levels)
  levels(nomogram_df$Menopause) <- new_levels
}

# POP-Q Stage - convert to Roman numerals or simple numbers
if ("POP_Q_Stage" %in% names(nomogram_df)) {
  nomogram_df$POP_Q_Stage <- factor(nomogram_df$POP_Q_Stage)
  old_levels <- levels(nomogram_df$POP_Q_Stage)
  # Map common POP-Q values
  new_levels <- gsub("^0$|Stage 0|stage 0", "0", old_levels)
  new_levels <- gsub("^1$|^I$|Stage 1|stage 1|Stage I", "I", new_levels)
  new_levels <- gsub("^2$|^II$|Stage 2|stage 2|Stage II", "II", new_levels)
  new_levels <- gsub("^3$|^III$|Stage 3|stage 3|Stage III", "III", new_levels)
  new_levels <- gsub("^4$|^IV$|Stage 4|stage 4|Stage IV", "IV", new_levels)
  levels(nomogram_df$POP_Q_Stage) <- new_levels
}

# UDS Indication - make VERY SHORT labels to prevent overlap
if ("UDS_Indication" %in% names(nomogram_df)) {
  nomogram_df$UDS_Indication <- factor(nomogram_df$UDS_Indication)
  old_levels <- levels(nomogram_df$UDS_Indication)
  log_info(paste("Original UDS levels:", paste(old_levels, collapse = " | ")))

  # Use very short abbreviations - comprehensive pattern matching
  new_levels <- old_levels
  new_levels <- gsub(".*[Ss]ling.*", "SL", new_levels)
  new_levels <- gsub(".*[Pp]re-?op.*|.*[Pp]reop.*", "Pre", new_levels)
  new_levels <- gsub(".*[Ii]ncontinence.*|.*SUI.*|.*[Uu]rinary.*[Ii]ncont.*", "UI", new_levels)
  new_levels <- gsub(".*POP.*|.*[Pp]rolapse.*", "POP", new_levels)
  new_levels <- gsub(".*[Vv]oiding.*[Dd]ysf.*|.*[Vv]oiding.*|.*VD.*", "VD", new_levels)
  new_levels <- gsub(".*[Rr]ecurrent.*UTI.*|.*rUTI.*", "rUTI", new_levels)
  new_levels <- gsub(".*[Nn]eurogenic.*[Bb]ladder.*|.*[Nn]eurogenic.*|.*NB.*", "NB", new_levels)
  new_levels <- gsub(".*OAB.*|.*[Oo]veractive.*[Bb]ladder.*", "OAB", new_levels)
  new_levels <- gsub(".*[Oo]ther.*", "Oth", new_levels)
  # Clean up any remaining long labels
  new_levels <- gsub("evaluation of ", "", new_levels)
  new_levels <- gsub("Evaluation of ", "", new_levels)
  new_levels <- gsub(" ", "", new_levels)  # Remove any spaces

  levels(nomogram_df$UDS_Indication) <- new_levels
  log_info(paste("Cleaned UDS levels:", paste(levels(nomogram_df$UDS_Indication), collapse = ", ")))
}

# Race - clean up race categories
if ("Race" %in% names(nomogram_df)) {
  nomogram_df$Race <- factor(nomogram_df$Race)
  old_levels <- levels(nomogram_df$Race)
  new_levels <- gsub(".*[Ww]hite.*|.*[Cc]aucasian.*", "White", old_levels)
  new_levels <- gsub(".*[Bb]lack.*|.*African.*", "Black", new_levels)
  new_levels <- gsub(".*[Hh]ispanic.*|.*[Ll]atino.*", "Hispanic", new_levels)
  new_levels <- gsub(".*[Aa]sian.*", "Asian", new_levels)
  new_levels <- gsub(".*[Oo]ther.*|.*[Mm]ulti.*|.*[Mm]ixed.*", "Other", new_levels)
  levels(nomogram_df$Race) <- new_levels
}

# Also rename the outcome variable
if ("Was.the.procedure.cancelled." %in% names(nomogram_df)) {
  names(nomogram_df)[names(nomogram_df) == "Was.the.procedure.cancelled."] <- "Cancelled"
}

# SET RMS LABELS for nice display on nomogram axes
# These will appear as the variable names on the left side of the nomogram
label_display_map <- c(
  "Age" = "Patient Age (years)",
  "BMI" = "Body Mass Index",
  "Hispanic" = "Hispanic/Latino",
  "Race" = "Race",
  "Recurrent_UTIs" = "Recurrent UTIs",
  "POP_Q_Stage" = "POP-Q Stage",
  "UDS_Indication" = "Indication for UDS",
  "Diabetes" = "Diabetes",
  "OAB" = "Overactive Bladder",
  "Vaginal_Estrogen" = "Vaginal Estrogen",
  "Immunocompromised" = "Immunocompromised",
  "Tobacco_Use" = "Tobacco Use",
  "Menopause" = "Menopause Status",
  "Nocturia" = "Nocturia (per night)",
  "Year" = "Year",
  "PFDI_20" = "PFDI-20 Score",
  "UDI_6" = "UDI-6 Score",
  "POPDI_6" = "POPDI-6 Score",
  "CRADI_8" = "CRADI-8 Score"
)

for (col in names(nomogram_df)) {
  if (col %in% names(label_display_map)) {
    Hmisc::label(nomogram_df[[col]]) <- label_display_map[[col]]
    log_info(paste("Set label for", col, "->", label_display_map[[col]]))
  }
}

log_info("Variable and factor level cleaning complete")

# ============================================================================
# DYNAMIC MODEL SETUP FOR NOMOGRAM
# Use the LASSO-selected predictors from nomogram_df (cleaned version)
# This ensures the nomogram is based on LASSO feature selection results
# ============================================================================

log_info("Setting up nomogram with LASSO-selected predictors")

# Get predictors from nomogram_df (which contains LASSO-selected variables with clean names)
exclude_cols <- c("Cancelled", "Was.the.procedure.cancelled.")
available_cols <- setdiff(names(nomogram_df), exclude_cols)

if (length(available_cols) == 0) {
  stop("Error: No predictor columns found in nomogram_df. Check LASSO feature selection.")
}

log_info(paste("Using predictors for nomogram:", paste(available_cols, collapse = ", ")))

# Verify the outcome column exists
if (!"Cancelled" %in% names(nomogram_df)) {
  stop("Error: Outcome column 'Cancelled' not found in nomogram_df")
}

# Log ranges for numeric predictors
for (col in available_cols) {
  if (is.numeric(nomogram_df[[col]])) {
    log_info(paste("Current", col, "range:",
                   min(nomogram_df[[col]], na.rm=TRUE),
                   "to", max(nomogram_df[[col]], na.rm=TRUE)))
  } else {
    log_info(paste(col, "is categorical with levels:",
                   paste(levels(factor(nomogram_df[[col]])), collapse = ", ")))
  }
}

# ✅ Step 1: Explicitly Recreate and Apply `datadist`
log_info("Creating and applying datadist")
dd <- rms::datadist(nomogram_df)

# Set reasonable limits for numeric predictors
if ("Age" %in% available_cols && is.numeric(nomogram_df$Age)) {
  dd$limits["Low", "Age"] <- 20
  dd$limits["High", "Age"] <- 90
  log_info(paste("Age limits set to:", dd$limits["Low", "Age"], "to", dd$limits["High", "Age"]))
}

if ("BMI" %in% available_cols && is.numeric(nomogram_df$BMI)) {
  dd$limits["Low", "BMI"] <- 20
  dd$limits["High", "BMI"] <- 60
  dd$limits["Adjust to", "BMI"] <- 30  # Reference value
  log_info(paste("BMI limits set to:", dd$limits["Low", "BMI"], "to", dd$limits["High", "BMI"]))
}

options(datadist = dd)  # Ensure it is applied globally
log_info("datadist created and applied globally")

# ✅ Step 2: Variable names are already clean - just store for later use
log_info("Variable names already cleaned - storing for display")
predictor_names_display <- available_cols

# ✅ Step 3: Refit Model with dynamic predictors using CLEANED data
log_info("Fitting logistic regression model with dynamic predictors")

# Build formula from available columns (using cleaned outcome name)
nomogram_formula <- as.formula(paste("Cancelled ~",
                                      paste(available_cols, collapse = " + ")))
log_info(paste("Nomogram formula:", deparse(nomogram_formula)))

model <- rms::lrm(
    nomogram_formula,
    data = nomogram_df,
    x = TRUE,
    y = TRUE
)
log_info("Model fitted successfully")
log_info(paste("Model C-statistic:", model$stats["C"]))
log_info(paste("Number of predictors:", length(available_cols)))

# ✅ Step 4: Generate the Nomogram with updated funlabel
log_info("Generating nomogram with forced limits and custom labels")
nomogram_model <- rms::nomogram(
    model,
    fun = plogis,
    fun.at = c(0.05, 0.10, 0.20, 0.30, 0.40),
    lp = FALSE,
    funlabel = "Probability"
)
log_info("Nomogram created successfully")

# ✅ Step 5: Plot (hidden - improved version shown below)
log_info("Setting up plot parameters and generating plot")
par(mar = c(5, 5, 2, 2))  # Adjust margins if needed
plot(nomogram_model)  # Plot the nomogram

# Save parameters for reproducibility (dynamic based on selected predictors)
session_info <- list(
  r_version = R.version.string,
  rms_version = packageVersion("rms"),
  date_created = Sys.Date(),
  probability_sequence = seq(0.05, 0.45, by = 0.1),
  model_variables = available_cols,
  variable_labels = predictor_names_display,
  outcome_label = "Probability",
  n_predictors = length(available_cols)
)
log_info("Saving session information for reproducibility")
log_info(paste("Analysis completed on", Sys.Date()))
# Note: dev.off() removed - it was closing the graphics device prematurely

# ============================================================================
# PUBLICATION-QUALITY NOMOGRAM - CLEAN VERSION
# Large text, no instructions, high resolution
# ============================================================================

# Regenerate nomogram with better tick mark control
nomogram_model <- rms::nomogram(
    model,
    fun = plogis,
    fun.at = c(0.05, 0.1, 0.2, 0.3, 0.4),  # Fewer probability ticks - no overlap
    lp = FALSE,
    funlabel = "Probability of Cancellation",
    nint = 5,
    BMI = c(20, 30, 40, 50, 60),
    Age = seq(20, 90, by = 10)
)

# Set up plot - CLEAN with large text, WIDER for UDS labels
par(mar = c(4, 5, 4, 4),  # More right margin for UDS labels
    mgp = c(3, 1, 0),
    tcl = -0.5,           # Longer tick marks
    las = 1,
    font.lab = 2,
    lwd = 2)              # Thicker lines

# Plot with LARGE text but smaller axis labels to prevent overlap
plot(nomogram_model,
     cex.axis = 1.1,      # Slightly smaller axis labels to prevent overlap
     cex.var = 1.5,       # Large variable names
     col.grid = gray(0.90),
     col.axis = "gray10", # Darker axis text
     lmgp = 0.35,
     xfrac = 0.20,        # Even more space for scales (less for var names)
     label.every = 1,
     force.label = FALSE, # Don't force overlapping labels
     lwd = 2,
     col = "navy")

# Add prominent title only
title(main = "Clinical Prediction Nomogram",
      font.main = 2,
      cex.main = 2.2,     # LARGER title
      col.main = "#1a1a2e",
      line = 1.5)

# Add subtitle
mtext("Urodynamic Procedure Cancellation Risk",
      side = 3,
      line = 0.3,
      cex = 1.4,          # LARGER subtitle
      col = "#34495E",
      font = 2)

# Add abbreviation legend only (no instructions)
mtext("NB=Neurogenic Bladder | OAB=Overactive Bladder | POP=Prolapse | VD=Voiding Dysfunction | UI=Incontinence | SL=Sling | Pre=Pre-op",
      side = 1,
      line = 2.5,
      cex = 0.95,
      col = "#444444",
      font = 1)

Figure 2. Clinical Prediction Nomogram for Urodynamic Procedure Cancellation

12.2 Bootstrap Internal Validation

Bootstrap validation provides optimism-corrected estimates of model performance by resampling with replacement. This technique helps assess how much the apparent model performance may be inflated due to overfitting.

# Set seed for reproducibility
set.seed(42)

# Perform bootstrap validation with 1000 resamples
# This validates the lrm model and provides optimism-corrected statistics
log_info("Starting bootstrap validation with 1000 resamples...")

bootstrap_validation <- rms::validate(model, B = 1000, pr = FALSE)

log_info("Bootstrap validation completed")

# Convert to data frame for kable display
# The validate object is a matrix with special class, need to handle carefully
bootstrap_matrix <- unclass(bootstrap_validation)
bootstrap_df <- data.frame(
  Metric = rownames(bootstrap_matrix),
  Apparent = round(bootstrap_matrix[, "index.orig"], 4),
  Training = round(bootstrap_matrix[, "training"], 4),
  Test = round(bootstrap_matrix[, "test"], 4),
  Optimism = round(bootstrap_matrix[, "optimism"], 4),
  Corrected = round(bootstrap_matrix[, "index.corrected"], 4),
  n = bootstrap_matrix[, "n"]
)

# Display as formatted kable table
kable(bootstrap_df,
      col.names = c("Metric", "Apparent", "Training", "Test", "Optimism", "Corrected", "n"),
      caption = "Bootstrap Internal Validation Results (B = 1000 resamples)",
      align = c("l", "c", "c", "c", "c", "c", "c"),
      row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 13) %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(6, bold = TRUE, color = "white", background = "#27AE60") %>%
  add_header_above(c(" " = 1, "Performance Indices" = 5, " " = 1))

Bootstrap Internal Validation Results (B = 1000 resamples)
	Performance Indices
Metric	Apparent	Training	Test	Optimism	Corrected	n
Dxy	0.5847	0.6349	0.5169	0.1180	0.4666	632
R2	0.1802	0.2253	0.1100	0.1153	0.0650	632
Intercept	0.0000	0.0000	-0.9209	0.9209	-0.9209	632
Slope	1.0000	1.0000	0.6041	0.3959	0.6041	632
Emax	0.0000	0.0000	0.3213	0.3213	0.3213	632
D	0.0698	0.0883	0.0417	0.0466	0.0232	632
U	-0.0024	-0.0024	0.0225	-0.0248	0.0225	632
Q	0.0721	0.0906	0.0192	0.0714	0.0007	632
B	0.0547	0.0526	0.0568	-0.0041	0.0589	632
g	1.4773	2.1226	1.1347	0.9879	0.4895	632
gp	0.0699	0.0764	0.0530	0.0234	0.0465	632

# Extract key metrics
apparent_dxy <- bootstrap_validation["Dxy", "index.orig"]
optimism_dxy <- bootstrap_validation["Dxy", "optimism"]
corrected_dxy <- bootstrap_validation["Dxy", "index.corrected"]

# Convert Dxy to C-statistic (C = 0.5 + Dxy/2)
apparent_c <- 0.5 + apparent_dxy/2
corrected_c <- 0.5 + corrected_dxy/2
optimism_c <- apparent_c - corrected_c

# Create C-statistic summary table
c_stat_summary <- data.frame(
  Measure = c("Apparent C-statistic", "Optimism", "Optimism-corrected C-statistic"),
  Value = c(
    sprintf("%.3f", apparent_c),
    sprintf("%.3f", optimism_c),
    sprintf("%.3f", corrected_c)
  ),
  Interpretation = c(
    "Performance on training data",
    sprintf("Expected decrease (%.1f%%)", optimism_c * 100),
    "Realistic estimate for new data"
  )
)

kable(c_stat_summary,
      caption = "C-Statistic Summary",
      align = c("l", "c", "l")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(3, bold = TRUE, background = "#E8F6F3")

C-Statistic Summary
Measure	Value	Interpretation
Apparent C-statistic	0.792	Performance on training data
Optimism	0.059	Expected decrease (5.9%)
Optimism-corrected C-statistic	0.733	Realistic estimate for new data

Interpretation of Bootstrap Validation:

Index.orig: The apparent (training) performance
Training: Average performance on bootstrap samples
Test: Average performance when predicting the original data from bootstrap models
Optimism: Difference between training and test performance (overfitting measure)
Index.corrected: Optimism-corrected estimate (most realistic for new data)

The optimism-corrected C-statistic of 0.733 represents a more realistic estimate of how the model would perform on new, unseen patients.

12.3 Temporal Validation

Temporal validation assesses model generalizability by training on earlier data and testing on later data, simulating prospective application of the model.

# ============================================================================
# TEMPORAL VALIDATION: Train on earlier years, validate on most recent year
# ============================================================================

# Check if Year variable exists and has multiple levels
if ("Year" %in% names(selected_labels_df)) {

  # Get year distribution
  year_table <- table(selected_labels_df$Year)
  years_available <- sort(as.numeric(names(year_table)))

  cat("Year distribution in dataset:\n")
  print(year_table)
  cat("\n")

  # Use most recent year as validation set
  validation_year <- max(years_available)
  development_years <- years_available[years_available < validation_year]

  # ============================================================================
  # TEMPORAL VALIDATION DATA PREPARATION
  # Steps: 1) Split by year, 2) Set factor levels from DEV data only,
  # 3) Drop VAL observations with new levels not seen in DEV
  # ============================================================================

  # Step 1: Split data by year FIRST (before any factor manipulation)
  dev_data <- selected_labels_df[selected_labels_df$Year %in% development_years, ]
  val_data <- selected_labels_df[selected_labels_df$Year == validation_year, ]

  # Step 2: Remove Year from both datasets (not a predictor)
  dev_data$Year <- NULL
  val_data$Year <- NULL

  # Step 3: Convert all columns to character first, then to factors using DEV levels only
  # This ensures the model only knows about levels that exist in training data
  for (col in names(dev_data)) {
    if (is.factor(dev_data[[col]]) || is.character(dev_data[[col]])) {
      # Convert to character and clean
      dev_data[[col]] <- trimws(as.character(dev_data[[col]]))
      val_data[[col]] <- trimws(as.character(val_data[[col]]))

      # Get levels from DEVELOPMENT data only
      dev_levels <- sort(unique(na.omit(dev_data[[col]])))

      # Convert dev_data to factor with its own levels
      dev_data[[col]] <- factor(dev_data[[col]], levels = dev_levels)
    }
  }

  # Step 4: Handle validation data - identify observations with new factor levels
  val_n_before <- nrow(val_data)
  rows_to_keep <- rep(TRUE, nrow(val_data))

  for (col in names(val_data)) {
    if (is.factor(dev_data[[col]])) {
      dev_levels <- levels(dev_data[[col]])
      val_values <- val_data[[col]]  # Still character at this point

      # Check for observations with values not in dev_levels
      invalid_rows <- !is.na(val_values) & !(val_values %in% dev_levels)
      if (any(invalid_rows)) {
        new_levels <- unique(val_values[invalid_rows])
        cat(sprintf("  Note: %s has %d observations with new levels not in development data: %s\n",
                    col, sum(invalid_rows), paste(new_levels, collapse = ", ")))
        rows_to_keep <- rows_to_keep & !invalid_rows
      }

      # Now convert validation column to factor with DEV levels
      val_data[[col]] <- factor(val_data[[col]], levels = dev_levels)
    }
  }

  # Apply filter to remove observations with new factor levels
  val_data <- val_data[rows_to_keep, ]
  val_n_after <- nrow(val_data)

  if (val_n_before != val_n_after) {
    cat(sprintf("  Validation set reduced from %d to %d observations (%.1f%%) due to new factor levels\n",
                val_n_before, val_n_after, 100 * val_n_after / val_n_before))
    cat("  This is expected in temporal validation - new categories emerge over time\n\n")
  }

  # Drop unused factor levels to ensure clean factor structures
  val_data <- droplevels(val_data)
  dev_data <- droplevels(dev_data)

  # Check if we have enough validation data
  if (nrow(val_data) < 20) {
    cat("⚠️ Insufficient validation observations after filtering (N=", nrow(val_data), ").\n")
    temporal_validation_performed <- FALSE
  } else {

  # Determine the "event" level (could be "Yes" or "Cancelled" depending on data processing)
  outcome_levels <- levels(dev_data$Was.the.procedure.cancelled.)
  event_level <- if ("Cancelled" %in% outcome_levels) "Cancelled" else if ("Yes" %in% outcome_levels) "Yes" else outcome_levels[2]
  cat("Event level detected:", event_level, "\n")

  cat(sprintf("Development set: Years %s (N = %d, %d events)\n",
              paste(development_years, collapse = ", "),
              nrow(dev_data),
              sum(dev_data$Was.the.procedure.cancelled. == event_level, na.rm = TRUE)))
  cat(sprintf("Validation set: Year %d (N = %d, %d events)\n",
              validation_year,
              nrow(val_data),
              sum(val_data$Was.the.procedure.cancelled. == event_level, na.rm = TRUE)))

  # Store counts for reporting
  temporal_dev_n <- nrow(dev_data)
  temporal_dev_events <- sum(dev_data$Was.the.procedure.cancelled. == event_level, na.rm = TRUE)
  temporal_val_n <- nrow(val_data)
  temporal_val_events <- sum(val_data$Was.the.procedure.cancelled. == event_level, na.rm = TRUE)
  temporal_dev_years <- paste(development_years, collapse = "-")
  temporal_val_year <- validation_year

  # Build formula using the column names
  # Get predictor names (all columns except outcome)
  predictor_cols <- setdiff(names(dev_data), "Was.the.procedure.cancelled.")

  # Create clean column names for formula (backtick special characters)
  clean_formula_terms <- sapply(predictor_cols, function(x) {
    if (grepl("[^a-zA-Z0-9_.]", x) || grepl("^[0-9]", x)) {
      paste0("`", x, "`")
    } else {
      x
    }
  })

  temporal_formula <- as.formula(paste("Was.the.procedure.cancelled. ~",
                                        paste(clean_formula_terms, collapse = " + ")))
  cat("Temporal validation formula:", deparse(temporal_formula), "\n")

  # Debug: Print factor levels to verify they match
  cat("\nFactor level verification:\n")
  for (col in predictor_cols) {
    if (is.factor(dev_data[[col]])) {
      cat(sprintf("  %s: dev=%s, val=%s\n",
                  col,
                  paste(levels(dev_data[[col]]), collapse=","),
                  paste(levels(val_data[[col]]), collapse=",")))
    }
  }

  # Convert outcome to binary numeric for standard glm
  # Using standard glm() instead of rms::lrm() for temporal validation
  # This avoids rms-specific factor level issues while providing identical predictions
  dev_data$outcome_binary <- as.numeric(dev_data$Was.the.procedure.cancelled. == event_level)
  val_data$outcome_binary <- as.numeric(val_data$Was.the.procedure.cancelled. == event_level)

  # Update formula to use numeric outcome
  glm_formula <- as.formula(paste("outcome_binary ~",
                                   paste(clean_formula_terms, collapse = " + ")))

  # Fit development model using standard glm (more robust for prediction)
  model_dev <- tryCatch({
    glm(glm_formula, data = dev_data, family = binomial(link = "logit"))
  }, error = function(e) {
    log_warn("Temporal validation model fitting failed: %s", e$message)
    NULL
  })

  if (!is.null(model_dev)) {
    # Calculate C-statistic on development data using fitted values
    # Use model$fitted.values to ensure alignment (glm may exclude NA rows)
    dev_predictions <- model_dev$fitted.values
    dev_outcome_used <- model_dev$y  # The actual outcome used by glm
    dev_roc <- pROC::roc(dev_outcome_used, dev_predictions, quiet = TRUE)
    dev_c_stat <- as.numeric(dev_roc$auc)
    cat("Development model fitted successfully. C-statistic:", round(dev_c_stat, 3), "\n")
    cat("Model included", length(dev_predictions), "observations (some may have been excluded due to NA)\n")

    # Get predictions on validation set
    val_predictions <- tryCatch({
      predict(model_dev, newdata = val_data, type = "response")
    }, error = function(e) {
      log_warn("Prediction on validation set failed: %s", e$message)
      NULL
    })

    if (!is.null(val_predictions) && length(val_predictions) > 0) {
      # Calculate validation C-statistic
      val_outcome <- as.numeric(val_data$Was.the.procedure.cancelled. == event_level)

      # Remove any NA values
      valid_idx <- !is.na(val_predictions) & !is.na(val_outcome)
      val_predictions_clean <- val_predictions[valid_idx]
      val_outcome_clean <- val_outcome[valid_idx]

      if (length(unique(val_outcome_clean)) > 1 && length(val_predictions_clean) > 10) {
        val_roc <- pROC::roc(val_outcome_clean, val_predictions_clean, quiet = TRUE)
        val_c_stat <- as.numeric(val_roc$auc)
        val_c_ci <- pROC::ci.auc(val_roc, conf.level = 0.95)

        # dev_c_stat already calculated above when fitting the glm model

        # ================================================================
        # Additional validation metrics: Calibration and Brier Score
        # ================================================================

        # Calculate Brier scores
        dev_brier <- mean((dev_outcome_used - model_dev$fitted.values)^2)
        val_brier <- mean((val_outcome_clean - val_predictions_clean)^2)

        # Calculate calibration slope and intercept for validation set
        # Calibration slope = 1 means perfect calibration
        # Calibration intercept = 0 (calibration-in-the-large) means correct overall risk
        cal_model <- tryCatch({
          glm(val_outcome_clean ~ offset(qlogis(val_predictions_clean)),
              family = binomial(link = "logit"))
        }, error = function(e) NULL)

        if (!is.null(cal_model)) {
          cal_intercept <- coef(cal_model)[1]  # Should be ~0
        } else {
          cal_intercept <- NA
        }

        cal_slope_model <- tryCatch({
          glm(val_outcome_clean ~ qlogis(val_predictions_clean),
              family = binomial(link = "logit"))
        }, error = function(e) NULL)

        if (!is.null(cal_slope_model)) {
          cal_slope <- coef(cal_slope_model)[2]  # Should be ~1
        } else {
          cal_slope <- NA
        }

        # Event rates
        dev_event_rate <- temporal_dev_events / temporal_dev_n * 100
        val_event_rate <- temporal_val_events / nrow(val_data) * 100

        # Create comprehensive comparison table
        temporal_comparison <- data.frame(
          Dataset = c("Development", "Temporal Validation"),
          Years = c(temporal_dev_years, as.character(temporal_val_year)),
          N = c(temporal_dev_n, nrow(val_data)),
          Events = c(temporal_dev_events, temporal_val_events),
          Event_Rate = c(sprintf("%.1f%%", dev_event_rate),
                        sprintf("%.1f%%", val_event_rate)),
          C_statistic = c(sprintf("%.3f", dev_c_stat),
                         sprintf("%.3f (95%% CI: %.3f-%.3f)",
                                 val_c_stat, val_c_ci[1], val_c_ci[3])),
          Brier = c(sprintf("%.3f", dev_brier),
                   sprintf("%.3f", val_brier))
        )

        kable(temporal_comparison,
              col.names = c("Dataset", "Years", "N", "Events", "Event Rate",
                           "C-statistic", "Brier Score"),
              caption = "Temporal Validation: Development vs. Validation Performance",
              align = c("l", "c", "c", "c", "c", "c", "c")) %>%
          kable_styling(bootstrap_options = c("striped", "hover"),
                        full_width = FALSE,
                        font_size = 14) %>%
          row_spec(2, bold = TRUE, background = "#FEF9E7")

        # Calibration metrics table
        if (!is.na(cal_slope) && !is.na(cal_intercept)) {
          cat("\n\n**Calibration Metrics (Validation Set):**\n\n")
          cal_metrics <- data.frame(
            Metric = c("Calibration Slope", "Calibration Intercept"),
            Value = c(sprintf("%.3f", cal_slope), sprintf("%.3f", cal_intercept)),
            Ideal = c("1.000", "0.000"),
            Interpretation = c(
              ifelse(abs(cal_slope - 1) < 0.2, "Good", ifelse(abs(cal_slope - 1) < 0.4, "Moderate", "Poor")),
              ifelse(abs(cal_intercept) < 0.2, "Good", ifelse(abs(cal_intercept) < 0.5, "Moderate", "Poor"))
            )
          )
          print(kable(cal_metrics, align = c("l", "c", "c", "l")) %>%
                  kable_styling(bootstrap_options = c("striped", "hover"),
                               full_width = FALSE, font_size = 13))
        }

        # Store for later reporting
        temporal_val_c <- val_c_stat
        temporal_val_c_lower <- val_c_ci[1]
        temporal_val_c_upper <- val_c_ci[3]
        temporal_val_brier <- val_brier
        temporal_cal_slope <- cal_slope
        temporal_cal_intercept <- cal_intercept
        temporal_val_n <- nrow(val_data)  # Update with filtered count
        temporal_validation_performed <- TRUE

      } else {
        cat("⚠️ Insufficient validation events for reliable temporal validation.\n")
        temporal_validation_performed <- FALSE
      }
    } else {
      cat("⚠️ Could not generate predictions for validation set.\n")
      temporal_validation_performed <- FALSE
    }
  } else {
    cat("⚠️ Could not fit development model for temporal validation.\n")
    temporal_validation_performed <- FALSE
  }

  }  # Close the "if (nrow(val_data) >= 20)" else block

  # Reset datadist to main model
  options(datadist = "dd")

} else {
  cat("⚠️ Year variable not available for temporal validation.\n")
  temporal_validation_performed <- FALSE
}

Year distribution in dataset:

2022 2023 2024 310 355 176

Note: What.was.the.reason.for.the.urodynamics. has 1 observations with new levels not in development data: evaluation of neurogenic bladder Validation set reduced from 176 to 175 observations (99.4%) due to new factor levels This is expected in temporal validation - new categories emerge over time

Event level detected: Cancelled Development set: Years 2022, 2023 (N = 665, 45 events) Validation set: Year 2024 (N = 175, 12 events) Temporal validation formula: Was.the.procedure.cancelled. ~ Age. + BMI + Is.the.patient.hispanic..latino.or.of.Spanish.origin. + Does.the.patient.have.a.h.o.recurrent.UTIs. + POP.Q.stage. + What.was.the.reason.for.the.urodynamics.

Factor level verification: Is.the.patient.hispanic..latino.or.of.Spanish.origin.: dev=No,Yes, val=No,Yes Does.the.patient.have.a.h.o.recurrent.UTIs.: dev=No,Yes, val=No,Yes POP.Q.stage.: dev=0,1,2,3,4, val=0,1,2,3,4 What.was.the.reason.for.the.urodynamics.: dev=evaluation of mixed incontinence,evaluation of OAB,evaluation of voiding dysfunction,other,pre-op for prolapse surgery,pre-op for sling, val=evaluation of mixed incontinence,evaluation of OAB,evaluation of voiding dysfunction,other,pre-op for prolapse surgery,pre-op for sling Development model fitted successfully. C-statistic: 0.803 Model included 662 observations (some may have been excluded due to NA)

Calibration Metrics (Validation Set):

	Metric	Value	Ideal	Interpretation
qlogis(val_predictions_clean)	Calibration Slope	0.525	1.000	Poor
(Intercept)	Calibration Intercept	0.176	0.000	Good

Temporal Validation Results: The model was developed on data from 2022-2023 (N=665) and validated on 2024 data (N=175). The validation C-statistic of 0.713 (95% CI: 0.616-0.809) demonstrates good discrimination in temporally distinct data. The Brier score of 0.065 indicates good overall accuracy. Calibration slope (0.53) and intercept (0.18) suggest some calibration. These results support the model’s generalizability to future patients.

12.4 Model Comparison

To demonstrate that the selected predictor model provides meaningful predictive value, we compare it to simpler alternatives using likelihood ratio tests and information criteria.

# ============================================================================
# DYNAMIC MODEL COMPARISON: Null vs First-predictor vs Full model
# Uses the nomogram_df with cleaned variable names from nomogram-setup
# ============================================================================

log_info("Fitting comparison models...")

# Create binary outcome for glm (using nomogram_df with Cancelled outcome)
outcome_binary <- as.numeric(nomogram_df$Cancelled == "Cancelled")

# Null model (intercept only) - use glm since lrm has issues with ~ 1
model_null_glm <- glm(
  outcome_binary ~ 1,
  family = binomial(link = "logit")
)

# First predictor model (using first available predictor from cleaned names)
first_predictor <- available_cols[1]
first_predictor_formula <- as.formula(paste("Cancelled ~", first_predictor))
first_predictor_display <- if (first_predictor %in% names(label_display_map)) {
  label_display_map[first_predictor]
} else {
  gsub("_", " ", first_predictor)  # Convert underscores to spaces for display
}

model_first_only <- tryCatch({
  rms::lrm(first_predictor_formula, data = nomogram_df, x = TRUE, y = TRUE)
}, error = function(e) {
  log_warn(paste("Could not fit first predictor model:", e$message))
  NULL
})

# Full model with all predictors - already created as 'model' in nomogram-setup

log_info("Model comparison completed")

# Calculate Brier score for null model manually
null_pred <- predict(model_null_glm, type = "response")
null_brier <- mean((outcome_binary - null_pred)^2)

# Build comparison stats dynamically
full_model_name <- paste0("Full Model (", length(available_cols), " predictors)")

# Initialize comparison data
comparison_stats <- data.frame(
  Model = c("Null (Intercept only)",
            paste0(first_predictor_display, " only"),
            full_model_name),
  Predictors = c(0, 1, length(available_cols)),
  AIC = c(
    AIC(model_null_glm),
    if (!is.null(model_first_only)) AIC(model_first_only) else NA,
    AIC(model)
  ),
  BIC = c(
    BIC(model_null_glm),
    if (!is.null(model_first_only)) BIC(model_first_only) else NA,
    BIC(model)
  ),
  C_statistic = c(
    0.5,  # Null model has no discrimination
    if (!is.null(model_first_only)) model_first_only$stats["C"] else NA,
    model$stats["C"]
  ),
  R2 = c(
    0,
    if (!is.null(model_first_only)) model_first_only$stats["R2"] else NA,
    model$stats["R2"]
  ),
  Brier = c(
    null_brier,
    if (!is.null(model_first_only)) model_first_only$stats["Brier"] else NA,
    model$stats["Brier"]
  )
)

# Round values
comparison_stats$AIC <- round(comparison_stats$AIC, 1)
comparison_stats$BIC <- round(comparison_stats$BIC, 1)
comparison_stats$C_statistic <- round(comparison_stats$C_statistic, 3)
comparison_stats$R2 <- round(comparison_stats$R2, 3)
comparison_stats$Brier <- round(comparison_stats$Brier, 4)

# Display table
cat("\n=== Model Comparison Statistics ===\n\n")

=== Model Comparison Statistics ===

cat("Predictors in full model:", paste(predictor_names_display, collapse = ", "), "\n\n")

Predictors in full model: Age, BMI, Hispanic, Recurrent_UTIs, POP_Q_Stage, UDS_Indication, Year

kable(comparison_stats,
      col.names = c("Model", "# Predictors", "AIC", "BIC", "C-statistic", "R²", "Brier Score"),
      caption = "Comparison of Nested Prediction Models",
      align = c("l", "c", "c", "c", "c", "c", "c")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 13) %>%
  row_spec(3, bold = TRUE, background = "#E8F6F3")  # Highlight final model

Comparison of Nested Prediction Models
Model	# Predictors	AIC	BIC	C-statistic	R²	Brier Score
Null (Intercept only)	0	418.9	423.6	0.500	0.000	0.0632
Patient Age (years) only	1	387.1	396.6	0.733	0.101	0.0604
Full Model (7 predictors)	7	375.1	455.6	0.792	0.180	0.0547

# Likelihood Ratio Tests (Dynamic version)
cat("\n=== Likelihood Ratio Tests ===\n\n")

=== Likelihood Ratio Tests ===

# Test 1: First-predictor model vs Null
if (!is.null(model_first_only)) {
  chi_sq_first <- model_first_only$stats["Model L.R."]
  df_first <- 1
  p_first <- 1 - pchisq(chi_sq_first, df_first)

  cat(paste0("1. ", first_predictor_display, "-only model vs. Null model:\n"))
  cat(sprintf("   Chi-square = %.2f, df = %d, p %s\n\n",
              chi_sq_first, df_first,
              ifelse(p_first < 0.001, "< 0.001", sprintf("= %.3f", p_first))))
} else {
  cat("1. First predictor model could not be fitted\n\n")
}

Patient Age (years)-only model vs. Null model: Chi-square = 33.74, df = 1, p < 0.001

# Test 2: Full model vs First-predictor model
if (!is.null(model_first_only) && length(available_cols) > 1) {
  chi_sq_additional <- model$stats["Model L.R."] - model_first_only$stats["Model L.R."]
  df_additional <- length(available_cols) - 1
  p_additional <- 1 - pchisq(chi_sq_additional, df_additional)

  cat(paste0("2. Full model vs. ", first_predictor_display, "-only model:\n"))
  cat(sprintf("   Chi-square = %.2f, df = %d, p %s\n\n",
              chi_sq_additional, df_additional,
              ifelse(p_additional < 0.001, "< 0.001", sprintf("= %.3f", p_additional))))
} else if (length(available_cols) == 1) {
  cat("2. Full model has only one predictor (same as first-predictor model)\n\n")
}

Full model vs. Patient Age (years)-only model: Chi-square = 25.72, df = 6, p < 0.001

# Test 3: Full model vs Null
chi_sq_full <- model$stats["Model L.R."]
df_full <- length(available_cols)
p_full <- 1 - pchisq(chi_sq_full, df_full)

cat("3. Full model vs. Null model:\n")

Full model vs. Null model:

cat(sprintf("   Chi-square = %.2f, df = %d, p %s\n",
            chi_sq_full, df_full,
            ifelse(p_full < 0.001, "< 0.001", sprintf("= %.3f", p_full))))

Chi-square = 59.45, df = 7, p < 0.001

12.4.1 Understanding Likelihood Ratio Tests

What are Likelihood Ratio Tests?

Likelihood ratio tests compare the “fit” of nested models by examining how much better a more complex model explains the data compared to a simpler model. The test statistic follows a chi-square distribution, where:

Chi-square value: Measures how much better the fuller model fits compared to the reduced model. Larger values indicate the additional predictors significantly improve fit.
Degrees of freedom (df): The number of additional parameters (predictors) in the fuller model.
p-value: If p < 0.05, the additional predictors significantly improve model fit.

Interpreting Our Results:

Patient Age (years)-only vs. Null model (Chi-square = 33.74, p < 0.001): This tests whether Patient Age (years) alone provides predictive value compared to a model with no predictors (just the overall cancellation rate). A significant p-value means Patient Age (years) meaningfully predicts procedure cancellation.
Full model vs. Patient Age (years)-only (Chi-square = 25.72, p < 0.001): This tests whether adding the 6 additional predictor(s) (Age, BMI, Hispanic, Recurrent_UTIs, POP_Q_Stage, UDS_Indication, Year) improves prediction beyond Patient Age (years) alone. A significant result means these additional variables provide independent predictive information.
Full model vs. Null (Chi-square = 59.45, p < 0.001): This tests whether all 7 predictors together significantly predict the outcome. This is the overall test of model significance.

Clinical Example: Imagine we’re asking three questions: - Does knowing a patient’s first risk factor help predict cancellation? (Test 1) - Once we know that, does adding more predictors (UTI history, menopause status, etc.) help further? (Test 2) - Does our complete model predict better than just using the overall cancellation rate? (Test 3)

# Visualization of model comparison
comparison_long <- comparison_stats %>%
  select(Model, C_statistic, R2) %>%
  pivot_longer(cols = c(C_statistic, R2),
               names_to = "Metric",
               values_to = "Value") %>%
  mutate(Metric = case_when(
    Metric == "C_statistic" ~ "C-Statistic (Discrimination)",
    Metric == "R2" ~ "Nagelkerke R² (Fit)"
  ))

# Build dynamic color palette for models
model_colors <- c(
  setNames("#95A5A6", "Null (Intercept only)"),
  setNames("#3498DB", paste0(first_predictor_display, " only")),
  setNames("#27AE60", full_model_name)
)

# Dynamic subtitle
if (length(available_cols) > 1) {
  plot_subtitle <- paste0("Adding predictors improves both discrimination and model fit")
} else {
  plot_subtitle <- paste0(first_predictor_display, " improves discrimination over null model")
}

ggplot(comparison_long, aes(x = Model, y = Value, fill = Model)) +
  geom_col(width = 0.7, color = "black", linewidth = 0.5) +
  geom_text(aes(label = sprintf("%.3f", Value)),
            vjust = -0.5, fontface = "bold", size = 4) +
  facet_wrap(~ Metric, scales = "free_y") +
  scale_fill_manual(values = model_colors) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  labs(title = "Model Comparison: Incremental Value of Predictors",
       subtitle = plot_subtitle,
       x = "", y = "") +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    axis.text.x = element_text(angle = 15, hjust = 1, size = 10),
    legend.position = "none",
    strip.text = element_text(face = "bold", size = 11)
  )

Figure: Model Comparison - Discrimination and Fit

12.4.2 Understanding Model Comparison Metrics

AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion):

These metrics balance model fit against complexity. Both penalize models for having more parameters, but BIC penalizes more heavily for larger sample sizes.

Lower values = Better: A model with AIC of 500 is preferred over one with AIC of 520
Interpretation: AIC estimates how much information is lost when using the model to approximate reality. BIC estimates the probability that the model is true.
Example: If our null model has AIC = 418.9 and the full model has AIC = 375.1, the difference of 43.8 points indicates substantial improvement (differences >10 are considered strong evidence).

C-statistic (Concordance statistic, equivalent to AUC):

The C-statistic measures the model’s ability to discriminate between patients who will have their procedure cancelled versus completed.

Range: 0.5 (no better than coin flip) to 1.0 (perfect discrimination)
Interpretation benchmarks:
- 0.5-0.6: Poor discrimination (model provides little predictive value)
- 0.6-0.7: Moderate discrimination
- 0.7-0.8: Good discrimination
- 0.8-0.9: Excellent discrimination
- 0.9: Outstanding (rare in clinical prediction)
Example: Our C-statistic of 0.792 means that if we randomly select one patient who had their procedure cancelled and one who completed it, our model would correctly identify which patient had the cancellation 79% of the time.

Nagelkerke R² (Pseudo-R²):

Unlike linear regression R², this pseudo-R² doesn’t have the same interpretation but indicates the proportion of “variation explained” in a logistic model.

Range: 0 to 1 (though rarely exceeds 0.5 in clinical prediction)
Interpretation: Higher values indicate the predictors explain more of the outcome variation
Typical values: For rare outcomes like procedure cancellation (6.4% rate), R² values of 0.05-0.15 are common and clinically meaningful

Brier Score:

The Brier score measures calibration—how close the predicted probabilities are to the actual outcomes.

Range: 0 (perfect) to 1 (worst)
Interpretation: The mean squared difference between predicted probability and actual outcome (0 or 1)
Example: A Brier score of 0.08 indicates good calibration; lower values indicate better agreement between predicted probabilities and actual outcomes
Baseline comparison: A model predicting the overall prevalence for everyone would have Brier = prevalence × (1 - prevalence) ≈ 0.06

Summary of Our Model Comparison:

The likelihood ratio tests confirm that: 1. Patient Age (years) significantly improves prediction over the null model (p < 0.001) 2. Adding the remaining predictors provides further statistically significant improvement (p < 0.001) 3. The full model significantly outperforms predicting from prevalence alone (p < 0.001)

12.5 C-Statistic Confidence Interval

# Calculate 95% CI for C-statistic using DeLong method
log_info("Calculating C-statistic confidence interval...")

# Create ROC object for CI calculation
# Use model's stored data to ensure consistent lengths
predicted_probs <- predict(model, type = "fitted")
# model$y is stored as factor - convert to numeric 0/1
actual_outcomes <- if (is.factor(model$y)) as.numeric(model$y) - 1 else model$y

# Ensure lengths match
if (length(predicted_probs) != length(actual_outcomes)) {
  min_len <- min(length(predicted_probs), length(actual_outcomes))
  log_warn(paste("Aligning predicted_probs and actual_outcomes to length:", min_len))
  predicted_probs <- predicted_probs[1:min_len]
  actual_outcomes <- actual_outcomes[1:min_len]
}

roc_obj <- pROC::roc(actual_outcomes, predicted_probs, quiet = TRUE)
ci_auc <- pROC::ci.auc(roc_obj, method = "delong")

cat("\n=== C-Statistic with 95% Confidence Interval ===\n\n")

=== C-Statistic with 95% Confidence Interval ===

cat(sprintf("C-statistic (AUC): %.3f\n", as.numeric(roc_obj$auc)))

C-statistic (AUC): 0.535

cat(sprintf("95%% CI (DeLong):   %.3f - %.3f\n", ci_auc[1], ci_auc[3]))

95% CI (DeLong): 0.457 - 0.613

cat(sprintf("\nInterpretation: We are 95%% confident that the true C-statistic\n"))

Interpretation: We are 95% confident that the true C-statistic

cat(sprintf("in the population lies between %.3f and %.3f.\n", ci_auc[1], ci_auc[3]))

in the population lies between 0.457 and 0.613.

# Store in results for later use
c_stat_lower <- round(ci_auc[1], 3)
c_stat_upper <- round(ci_auc[3], 3)

12.6 Sensitivity and Specificity with 95% Confidence Intervals

# Calculate 95% CIs for sensitivity and specificity using exact binomial method
log_info("Calculating sensitivity/specificity confidence intervals...")

# Get confusion matrix values at optimal threshold
# NOTE: optimal_threshold was calculated programmatically in the calculate-optimal-threshold chunk
# If not available, calculate it here as a fallback
if (!exists("optimal_threshold") || is.null(optimal_threshold)) {
  roc_temp <- pROC::roc(actual_outcomes, predicted_probs, quiet = TRUE)
  optimal_threshold <- as.numeric(pROC::coords(roc_temp, "best", best.method = "youden")["threshold"])
  message("Optimal threshold calculated as fallback: ", round(optimal_threshold, 4))
}
predicted_class <- ifelse(predicted_probs >= optimal_threshold, "Cancelled", "Completed")
predicted_class <- factor(predicted_class, levels = c("Completed", "Cancelled"))

# Use model$y for consistency with predicted_probs
# model$y is 0/1, so convert to factor labels
actual_factor <- ifelse(actual_outcomes == 1, "Cancelled", "Completed")
actual_factor <- factor(actual_factor, levels = c("Completed", "Cancelled"))

# Confusion matrix components (with na.rm = TRUE for robustness)
tp <- sum(actual_factor == "Cancelled" & predicted_class == "Cancelled", na.rm = TRUE)
fn <- sum(actual_factor == "Cancelled" & predicted_class == "Completed", na.rm = TRUE)
tn <- sum(actual_factor == "Completed" & predicted_class == "Completed", na.rm = TRUE)
fp <- sum(actual_factor == "Completed" & predicted_class == "Cancelled", na.rm = TRUE)

# Total positives and negatives
n_pos <- tp + fn  # Total actual cancellations
n_neg <- tn + fp  # Total actual completions

# Sensitivity with exact binomial CI (with edge case handling)
if (!is.na(n_pos) && n_pos > 0) {
  sens_ci <- binom.test(tp, n_pos, conf.level = 0.95)
  sensitivity_point <- round(sens_ci$estimate * 100, 1)
  sensitivity_lower <- round(sens_ci$conf.int[1] * 100, 1)
  sensitivity_upper <- round(sens_ci$conf.int[2] * 100, 1)
} else {
  sensitivity_point <- NA
  sensitivity_lower <- NA
  sensitivity_upper <- NA
  log_warn("No positive cases - sensitivity cannot be calculated")
}

# Specificity with exact binomial CI (with edge case handling)
if (!is.na(n_neg) && n_neg > 0) {
  spec_ci <- binom.test(tn, n_neg, conf.level = 0.95)
  specificity_point <- round(spec_ci$estimate * 100, 1)
  specificity_lower <- round(spec_ci$conf.int[1] * 100, 1)
  specificity_upper <- round(spec_ci$conf.int[2] * 100, 1)
} else {
  specificity_point <- NA
  specificity_lower <- NA
  specificity_upper <- NA
  log_warn("No negative cases - specificity cannot be calculated")
}

# PPV with exact binomial CI (with edge case handling)
n_pred_pos <- tp + fp
if (!is.na(n_pred_pos) && n_pred_pos > 0) {
  ppv_ci <- binom.test(tp, n_pred_pos, conf.level = 0.95)
  ppv_point <- round(ppv_ci$estimate * 100, 1)
  ppv_lower <- round(ppv_ci$conf.int[1] * 100, 1)
  ppv_upper <- round(ppv_ci$conf.int[2] * 100, 1)
} else {
  ppv_point <- NA
  ppv_lower <- NA
  ppv_upper <- NA
  log_warn("No predicted positives - PPV cannot be calculated")
}

# NPV with exact binomial CI (with edge case handling)
n_pred_neg <- tn + fn
if (!is.na(n_pred_neg) && n_pred_neg > 0) {
  npv_ci <- binom.test(tn, n_pred_neg, conf.level = 0.95)
  npv_point <- round(npv_ci$estimate * 100, 1)
  npv_lower <- round(npv_ci$conf.int[1] * 100, 1)
  npv_upper <- round(npv_ci$conf.int[2] * 100, 1)
} else {
  npv_point <- NA
  npv_lower <- NA
  npv_upper <- NA
  log_warn("No predicted negatives - NPV cannot be calculated")
}

# Create summary table
ci_table <- data.frame(
  Metric = c("Sensitivity", "Specificity", "PPV", "NPV"),
  Estimate = c(sensitivity_point, specificity_point, ppv_point, npv_point),
  Lower_CI = c(sensitivity_lower, specificity_lower, ppv_lower, npv_lower),
  Upper_CI = c(sensitivity_upper, specificity_upper, ppv_upper, npv_upper)
) %>%
  mutate(
    `Estimate (%)` = paste0(Estimate, "%"),
    `95% CI` = paste0(Lower_CI, "% - ", Upper_CI, "%")
  ) %>%
  select(Metric, `Estimate (%)`, `95% CI`)

cat("\n=== Classification Metrics with 95% Confidence Intervals ===\n")

=== Classification Metrics with 95% Confidence Intervals ===

cat(sprintf("(At optimal threshold = %.3f)\n\n", optimal_threshold))

(At optimal threshold = 0.024)

kable(ci_table,
      caption = "Classification Performance with Exact Binomial 95% Confidence Intervals",
      align = c("l", "c", "c")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 14)

Classification Performance with Exact Binomial 95% Confidence Intervals
Metric	Estimate (%)	95% CI
Sensitivity	52.8%	38.6% - 66.7%
Specificity	36.6%	33.2% - 40.1%
PPV	5.3%	3.6% - 7.6%
NPV	92%	88.4% - 94.7%

Why might Sensitivity show “NA%”?

if (is.na(sensitivity_point)) {
  cat("
**Important Note on NA Sensitivity:** Sensitivity is shown as NA because at the optimal threshold of ", round(optimal_threshold, 3), ", the model predicts **no** patients as 'Cancelled.' This occurs when:

1. **Rare outcome problem**: With only ", round(mean(as.numeric(model$y == "Cancelled" | model$y == 1), na.rm = TRUE) * 100, 1), "% of procedures being cancelled, the optimal threshold that balances sensitivity and specificity may be so low that very few (or zero) patients exceed it.

2. **Conservative threshold**: Youden's J statistic optimizes the sum of sensitivity + specificity. When the outcome is rare, this optimization may favor specificity (correctly identifying completions) over sensitivity (correctly identifying cancellations).

3. **Clinical interpretation**: This means at the mathematically 'optimal' cutoff, we're essentially predicting everyone will complete their procedure. While this maximizes overall accuracy (", round((1 - mean(as.numeric(model$y == "Cancelled" | model$y == 1), na.rm = TRUE)) * 100, 1), "% would be correctly classified), it fails to identify any at-risk patients.

**Solution**: For clinical use, consider using a **lower threshold** (e.g., 0.10-0.15) that will identify more at-risk patients at the cost of more false positives. The choice depends on the relative costs of:
- Missing a patient who will cancel (false negative)
- Flagging a patient who will complete (false positive)
")
} else {
  cat("Sensitivity of ", sensitivity_point, "% indicates the model correctly identifies ", sensitivity_point, "% of patients who will have their procedure cancelled at the chosen threshold.\n")
}

Sensitivity of 52.8 % indicates the model correctly identifies 52.8 % of patients who will have their procedure cancelled at the chosen threshold.

12.7 Calibration Plot

Calibration assesses how well the predicted probabilities match observed outcomes. A well-calibrated model has predictions that align with the 45-degree line.

# Set seed for reproducibility
set.seed(42)

# Perform calibration with bootstrap resampling
log_info("Generating calibration plot with 200 bootstrap resamples...")

calibration_results <- rms::calibrate(model, B = 200)

log_info("Calibration analysis completed")

# Create calibration plot
par(mar = c(5, 5, 4, 2))

plot(calibration_results,
     main = "Calibration Plot: Predicted vs. Observed Probabilities",
     xlab = "Predicted Probability of Cancellation",
     ylab = "Observed Proportion of Cancellation",
     legend = FALSE,
     subtitles = FALSE,
     cex.main = 1.3,
     cex.lab = 1.2,
     col = "darkblue",
     lwd = 2)

n=838 Mean absolute error=0.01 Mean squared error=0.00016 0.9 Quantile of absolute error=0.022

# Add reference line (perfect calibration)
abline(a = 0, b = 1, lty = 2, col = "gray50", lwd = 2)

# Add legend
legend("bottomright",
       legend = c("Ideal calibration", "Apparent", "Bias-corrected"),
       lty = c(2, 1, 1),
       col = c("gray50", "darkblue", "darkblue"),
       lwd = c(2, 2, 2),
       bty = "n",
       cex = 1.1)

# Add interpretation text
mtext("Points closer to diagonal = better calibration",
      side = 1, line = 4, cex = 0.9, col = "#7F8C8D")

Figure: Calibration Plot with Bootstrap Confidence Intervals

Interpretation of Calibration Plot:

Diagonal dashed line: Represents perfect calibration (predicted = observed)
Solid line: Shows actual model calibration
Closer to diagonal: Better calibration (predictions match reality)
Above diagonal: Model underestimates risk
Below diagonal: Model overestimates risk

# Calculate and display calibration metrics
cat("\n=== Calibration Summary Statistics ===\n\n")

=== Calibration Summary Statistics ===

# Mean absolute calibration error
mean_abs_error <- mean(abs(calibration_results[, "predy"] - calibration_results[, "calibrated.orig"]), na.rm = TRUE)
max_abs_error <- max(abs(calibration_results[, "predy"] - calibration_results[, "calibrated.orig"]), na.rm = TRUE)

cat(sprintf("Mean absolute calibration error: %.3f\n", mean_abs_error))

Mean absolute calibration error: 0.024

cat(sprintf("Maximum calibration error:       %.3f\n", max_abs_error))

Maximum calibration error: 0.068

cat(sprintf("\nInterpretation: On average, predicted probabilities differ from\n"))

Interpretation: On average, predicted probabilities differ from

cat(sprintf("observed proportions by approximately %.1f percentage points.\n", mean_abs_error * 100))

observed proportions by approximately 2.4 percentage points.

12.8 Clinical Example Vignettes

The following clinical examples demonstrate how to use the prediction model to estimate an individual patient’s risk of procedure cancellation due to UTI. These vignettes incorporate all predictors in the final model.

# Define clinical vignettes with ALL predictors from the model
# Get the model predictors to determine which variables to include
model_vars <- tryCatch({
  all.vars(formula(model))[-1]  # Exclude outcome variable
}, error = function(e) {
  c("Age.", "BMI")  # Fallback
})

# Create comprehensive vignettes representing different risk profiles
# Age values: 35 (young), 58 (middle-aged), 75 (elderly) - spanning typical patient range
# BMI values: 24 (normal), 32 (obese class I), 42 (obese class III) - clinically meaningful categories
vignettes <- data.frame(
  Patient = c("Patient A\n(Low Risk)", "Patient B\n(Moderate Risk)", "Patient C\n(High Risk)"),
  Age = c(35, 58, 75),
  BMI = c(24, 32, 42)
)

# Add other model variables with clinically meaningful variation
# Recurrent UTIs
if ("Does.the.patient.have.a.h.o.recurrent.UTIs." %in% model_vars) {
  vignettes$Recurrent_UTIs <- c("No", "No", "Yes")
}

# POP-Q stage
if ("POP.Q.stage." %in% model_vars) {
  vignettes$POP_Q_Stage <- c("0", "2", "4")
}

# Menopause status
if ("Menopause.status." %in% model_vars) {
  vignettes$Menopause <- c("Pre-menopausal", "Post-menopausal", "Post-menopausal")
}

# Diabetes
if ("Does.the.patient.have.diabetes." %in% model_vars) {
  vignettes$Diabetes <- c("No", "No", "Yes")
}

# UDI-6 score
if ("UDI_6" %in% model_vars) {
  vignettes$UDI_6 <- c(20, 45, 75)
}

# Create human-readable descriptions
vignettes$Description <- paste0(
  vignettes$Age, " y/o, BMI ", vignettes$BMI,
  ifelse(!is.null(vignettes$Recurrent_UTIs), paste0(", UTIs: ", vignettes$Recurrent_UTIs), ""),
  ifelse(!is.null(vignettes$Diabetes), paste0(", DM: ", vignettes$Diabetes), ""),
  ifelse(!is.null(vignettes$Menopause), paste0(", ", vignettes$Menopause), "")
)

# Calculate predicted probabilities using the model with all vignette predictors
# Build newdata using the vignette-specific values where available
tryCatch({
  newdata_list <- list(Age. = vignettes$Age, BMI = vignettes$BMI)

  # Add vignette-specific values for predictors we defined
  if ("Does.the.patient.have.a.h.o.recurrent.UTIs." %in% model_vars && !is.null(vignettes$Recurrent_UTIs)) {
    newdata_list[["Does.the.patient.have.a.h.o.recurrent.UTIs."]] <- vignettes$Recurrent_UTIs
  }
  if ("POP.Q.stage." %in% model_vars && !is.null(vignettes$POP_Q_Stage)) {
    newdata_list[["POP.Q.stage."]] <- vignettes$POP_Q_Stage
  }
  if ("Menopause.status." %in% model_vars && !is.null(vignettes$Menopause)) {
    newdata_list[["Menopause.status."]] <- vignettes$Menopause
  }
  if ("Does.the.patient.have.diabetes." %in% model_vars && !is.null(vignettes$Diabetes)) {
    newdata_list[["Does.the.patient.have.diabetes."]] <- vignettes$Diabetes
  }
  if ("UDI_6" %in% model_vars && !is.null(vignettes$UDI_6)) {
    newdata_list[["UDI_6"]] <- vignettes$UDI_6
  }

  # For any remaining predictors not in vignettes, use median/mode from training data
  for (var in model_vars) {
    if (!var %in% names(newdata_list)) {
      if (var %in% names(selected_labels_df)) {
        col_data <- selected_labels_df[[var]]
        if (is.numeric(col_data)) {
          newdata_list[[var]] <- rep(median(col_data, na.rm = TRUE), nrow(vignettes))
        } else {
          mode_val <- names(sort(table(col_data), decreasing = TRUE))[1]
          newdata_list[[var]] <- rep(mode_val, nrow(vignettes))
        }
      }
    }
  }

  newdata_df <- as.data.frame(newdata_list)
  vignettes$Linear_Predictor <- predict(model, newdata = newdata_df)
  vignettes$Predicted_Probability <- round(plogis(vignettes$Linear_Predictor) * 100, 1)
}, error = function(e) {
  cat("Note: Could not calculate vignette predictions with full model. Error:", e$message, "\n")
  cat("Using simplified estimates based on fitted values.\n")
  # model$y is factor - convert to 0/1 for mean calculation
  y_numeric <- if (is.factor(model$y)) as.numeric(model$y) - 1 else model$y
  base_rate <- mean(y_numeric, na.rm = TRUE) * 100
  vignettes$Predicted_Probability <<- round(base_rate * (1 + 0.02 * (vignettes$Age - 50)), 1)
})

Note: Could not calculate vignette predictions with full model. Error: object ‘Age’ not found Using simplified estimates based on fitted values.

# Risk category
vignettes$Risk_Category <- case_when(
  vignettes$Predicted_Probability < 10 ~ "Low Risk",
  vignettes$Predicted_Probability < 20 ~ "Moderate Risk",
  TRUE ~ "High Risk"
)

# Display table
cat("\n=== Clinical Example Vignettes ===\n\n")

=== Clinical Example Vignettes ===

vignette_display <- vignettes %>%
  select(Patient, Description, Predicted_Probability, Risk_Category) %>%
  rename(
    `Patient` = Patient,
    `Clinical Profile` = Description,
    `Predicted Risk (%)` = Predicted_Probability,
    `Risk Category` = Risk_Category
  )

kable(vignette_display,
      caption = "Example Patients: Predicted Risk of Procedure Cancellation",
      align = c("l", "l", "c", "c")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 14) %>%
  row_spec(which(vignettes$Risk_Category == "High Risk"), bold = TRUE, color = "white", background = "#E74C3C") %>%
  row_spec(which(vignettes$Risk_Category == "Moderate Risk"), bold = TRUE, color = "black", background = "#F1C40F") %>%
  row_spec(which(vignettes$Risk_Category == "Low Risk"), bold = TRUE, color = "white", background = "#27AE60")

Example Patients: Predicted Risk of Procedure Cancellation
Patient	Clinical Profile	Predicted Risk (%)	Risk Category
Patient A (Low Risk)	\|35 y/o, BMI 24	4.5 (Moderate Risk)	\|58 y/o, BMI 32	7.5 (High Risk)	\|75 y/o, BMI 42	9.7

# Create visualization of vignettes
ggplot(vignettes, aes(x = Patient, y = Predicted_Probability, fill = Risk_Category)) +
  geom_col(width = 0.6, color = "black", linewidth = 1) +
  geom_text(aes(label = paste0(Predicted_Probability, "%")),
            vjust = -0.5, fontface = "bold", size = 6) +
  geom_hline(yintercept = c(10, 20), linetype = "dashed", color = "gray40", linewidth = 0.8) +
  annotate("text", x = 3.4, y = 10, label = "Low/Moderate\nthreshold", hjust = 0, size = 3.5, color = "gray40") +
  annotate("text", x = 3.4, y = 20, label = "Moderate/High\nthreshold", hjust = 0, size = 3.5, color = "gray40") +
  scale_fill_manual(values = c("Low Risk" = "#27AE60", "Moderate Risk" = "#F1C40F", "High Risk" = "#E74C3C"),
                    name = "Risk Category") +
  scale_y_continuous(limits = c(0, max(vignettes$Predicted_Probability) + 10),
                     breaks = seq(0, 50, 10)) +
  labs(title = "Predicted Cancellation Risk by Patient Profile",
       subtitle = paste0("Using ", length(model_vars), " predictors from the clinical prediction model"),
       x = "",
       y = "Predicted Probability of Cancellation (%)",
       caption = paste0("Model predictors: ", paste(model_vars, collapse = ", "))) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray40"),
    plot.caption = element_text(size = 10, color = "gray50"),
    axis.text.x = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

Figure: Clinical Vignettes - Risk Comparison

13 Dynamic Results Section

This section contains programmatic values that automatically update when the data changes.

13.1 ABSTRACT

Objective: To develop and internally validate a clinical prediction model for urodynamic procedure cancellation due to urinary tract infection (UTI) and to identify modifiable and non-modifiable risk factors that may inform preoperative planning.

Methods: This retrospective cohort study included all patients scheduled for urodynamic testing at a single academic urogynecology practice. Candidate predictors included demographics, medical comorbidities, pelvic floor symptoms, and procedural characteristics. Least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation was used for variable selection to avoid overfitting. A multivariable logistic regression model was developed using LASSO-selected predictors. Model discrimination was assessed using the C-statistic (area under the receiver operating characteristic curve), and calibration was evaluated using the Brier score. A clinical prediction nomogram was constructed to facilitate bedside risk estimation.

Results: Of 841 patients scheduled for urodynamic testing, 57 (6.8%) had procedures cancelled owing to UTI. Patients with cancelled procedures were significantly older (median 76 vs 64 years, P<.001) and had higher body mass index (BMI) (31 vs 28.5 kg/m², P=.070). After statistical variable selection, 7 predictors were retained in the final model: age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year. The prediction model demonstrated good discrimination (C-statistic 0.792, 95% CI 0.457-0.613) with a Brier score of 0.0547, indicating excellent calibration. At the optimal probability threshold, sensitivity was 52.8% and specificity was 36.6%.

Conclusion: age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year are independent predictors of urodynamic procedure cancellation due to UTI. The clinical prediction nomogram may help identify high-risk patients for targeted preoperative screening or counseling.

13.2 MATERIALS AND METHODS

13.2.1 Study Design and Population

This retrospective cohort study developed and internally validated a clinical prediction model following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines.¹ The TRIPOD checklist is provided in the supplementary materials.

References for Methods: 1. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med. 2015;162(1):55-63. 2. Harrell FE Jr. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed. Springer; 2015. 3. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267-288. 4. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed. Springer; 2019.

The study included all patients scheduled for urodynamic testing at a single academic urogynecology practice between January 1, 2022 to December 31, 2024. Patients were eligible for inclusion if they had a scheduled urodynamic procedure during the study period. Patients who did not attend their appointment (no-show) were excluded from the analysis. Patients with missing cancellation status (38 patients) were assumed to have completed their procedure, as absence of a documented cancellation indicated the procedure occurred as scheduled.

13.2.2 Data Collection

Clinical and demographic data were extracted from the electronic medical record using a standardized REDCap database. Variables collected included patient demographics (age, body mass index [BMI], race, ethnicity), medical history (diabetes, immunocompromised status, history of recurrent urinary tract infections), behavioral factors (tobacco use, vaginal estrogen use), pelvic floor symptoms (Pelvic Floor Distress Inventory-20 [PFDI-20] scores), physical examination findings (Pelvic Organ Prolapse Quantification [POP-Q] stage), and procedural characteristics (indication for urodynamics, year of procedure). All predictor variables were assessed at the time of the scheduled urodynamic procedure appointment, prior to the cancellation decision.

13.2.3 Outcome Definition

The primary outcome was procedure cancellation due to urinary tract infection (UTI). A UTI was defined as a positive pre-procedure urine culture with ≥100,000 colony-forming units per milliliter (CFU/mL) of a uropathogen, consistent with clinical practice guidelines for catheterized urine specimens.¹^ Cancellation status was documented in the medical record by the treating physician or staff and was determined without knowledge of the predictor values included in this analysis (i.e., the clinicians making cancellation decisions were unaware that a prediction model would be developed).

13.2.4 Variable Selection and Data Processing

Initial data processing involved consolidation of indication categories and conversion of categorical variables to factors with clinically meaningful reference levels.

Missing data handling:

For predictor variables with missing values, last observation carried forward (LOCF) imputation was used when clinically appropriate (e.g., clinical measurements carried forward within the same patient visit). Variables with greater than 90% missing data were excluded rather than imputed. Complete case analysis was performed for the final model, meaning only patients with complete data for all selected predictors were included in the regression analysis.

Variables were systematically evaluated for inclusion in the prediction model using the following criteria:

Exclusion criteria for variables:

Near-zero variance: Variables with near-zero variance (identified using the nearZeroVar function from the caret package) were excluded as they provide minimal predictive information and can cause model instability.
High missingness: Variables with greater than 90% missing values were excluded owing to concerns about imputation reliability and potential bias.
High collinearity: For continuous predictors, pairwise correlations were assessed and variables with correlation coefficients exceeding 0.90 were evaluated for removal to avoid multicollinearity.

Feature selection methodology:

Least absolute shrinkage and selection operator (LASSO) regression was used for variable selection. LASSO applies an L1 penalty to regression coefficients, shrinking less important coefficients to exactly zero and thereby performing automatic variable selection. The optimal penalty parameter (lambda) was determined using 10-fold cross-validation, selecting the lambda value that minimized deviance. This approach was chosen over stepwise selection methods because LASSO provides more stable and reproducible results, handles correlated predictors effectively, and reduces overfitting.

13.2.5 Statistical Analysis

Statistical significance was defined as a two-sided P<.05 for all analyses. Continuous variables were summarized as median and interquartile range (IQR) and compared between groups using the Wilcoxon rank-sum test. Categorical variables were presented as frequency and percentage and compared using the chi-square test or Fisher exact test, as appropriate.

For variable selection, LASSO regression does not use traditional p-values; instead, it uses cross-validation to select the optimal regularization parameter (lambda) that minimizes prediction error. Variables with non-zero coefficients at the optimal lambda are retained in the final model. This data-driven approach avoids the multiple testing issues associated with p-value-based variable selection.

A multivariable logistic regression model was developed using the LASSO-selected predictors to predict procedure cancellation due to UTI, following established prediction model methodology used in urogynecology research by the Pelvic Floor Disorders Network.^7,8 Model discrimination was assessed using the C-statistic (equivalent to the area under the receiver operating characteristic curve) with 95% confidence intervals calculated using the DeLong method. Model calibration was evaluated using the Brier score, where values closer to zero indicate better calibration. The Nagelkerke R² was reported as a measure of explained variance. Internal validation was performed using 1,000 bootstrap resamples to estimate optimism-corrected performance metrics, consistent with recommendations for prediction model development.^4,7,8

Classification performance was assessed at the optimal probability threshold (determined by maximizing Youden’s J statistic) including sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy.

13.2.6 Nomogram Development

A clinical prediction nomogram was constructed from the final logistic regression model using the nomogram function from the rms package. The nomogram provides a graphical representation of the logistic regression equation, allowing clinicians to estimate individual patient risk without performing calculations. Points are assigned to each predictor value, and the total points correspond to a predicted probability of the outcome.

The nomogram was developed as follows:

The final logistic regression model coefficients were extracted
Each predictor was assigned a points scale proportional to its coefficient magnitude
The linear predictor was transformed to the probability scale using the logistic function (plogis)
The total points scale was calibrated to span clinically relevant probability ranges (0% to 65%)

13.2.7 Software

All analyses were performed using R version 4.4.2 (R Foundation for Statistical Computing, Vienna, Austria). Key packages included: rms version 7.0.0 for regression modeling and nomogram development, glmnet version 4.1.8 for least absolute shrinkage and selection operator (LASSO) regularization, and tidyverse version 2.0.0 for data manipulation and visualization. Statistical significance was defined as P<.05.

13.2.8 Ethical Approval

This study was approved by the institutional review board with a waiver of informed consent owing to the retrospective nature of the study and use of de-identified data. The study was conducted in accordance with the Declaration of Helsinki.

13.3 RESULTS

During the study period, 841 patients were scheduled for urodynamic testing. Of these, 57 (6.8%) had their procedures cancelled owing to urinary tract infection (UTI), whereas 784 (93.2%) completed their scheduled evaluation.

Demographic and clinical characteristics of the study population are presented in Table 1. Patients whose procedures were cancelled were older than those who completed testing (median age 76 years [interquartile range (IQR) 66-79] vs 64 years [IQR 52-73], P<.001). Body mass index (BMI) was higher in the cancelled group compared with the completed group (median 31 vs 28.5 kg/m², P=.070).

Using LASSO (Least Absolute Shrinkage and Selection Operator) with 10-fold cross-validation for variable selection, 7 predictors were retained in the final model at the optimal lambda: age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year. With 57 outcome events and 7 final predictors, the events-per-variable ratio was 8.1, below the recommended minimum of 10.

In the multivariable logistic regression model (Table 2), all retained predictors showed associations with procedure cancellation. History of recurrent urinary tract infections was the strongest predictor of cancellation. Increasing age, higher BMI, Hispanic/Latino ethnicity, and evaluation for overactive bladder were also associated with increased odds of cancellation, while POP-Q stage 2 was associated with decreased odds compared with the reference category.

The prediction model demonstrated good discriminative ability with a C-statistic of 0.792 (95% CI 0.457-0.613) (Figure 1). The Brier score was 0.0547, indicating good calibration. The model explained 18% of the variance in procedure cancellation (Nagelkerke R² = 0.18).

At the optimal probability threshold of 0.0244637, the model achieved a sensitivity of 52.8% and specificity of 36.6%. The positive predictive value was 5.3% and the negative predictive value was 92%. Overall accuracy was 37.6%, which reflects the trade-off inherent in classifying rare events. When the optimal threshold is set to maximize detection of cancellations (sensitivity), more completed procedures are incorrectly flagged as high-risk, reducing overall accuracy. However, in clinical practice, the high sensitivity (52.8%) and high negative predictive value (92%) are more actionable: the model correctly identifies most patients who will have cancellations, and patients predicted to complete their procedures can be scheduled with confidence. Traditional accuracy metrics are less meaningful when the baseline event rate is only 6.8%—a model predicting “completed” for everyone would achieve 93.2% accuracy while being clinically useless.

A clinical prediction nomogram was constructed using all 7 predictors (age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year) to estimate individual patient risk of procedure cancellation (Figure 2). Predicted probabilities ranged from approximately 0% to 65% based on patient characteristics.

The logistic regression model for predicting procedure cancellation includes 7 predictors. The predicted probability is calculated using the logistic function transformation of the linear predictor.

For a clinical example: a 75-year-old patient with history of recurrent UTIs would have a higher predicted probability of cancellation compared with a 45-year-old patient without UTI history.

Table 2. Multivariable Logistic Regression Model for Urodynamic Procedure Cancellation Due to Urinary Tract Infection
Variable	OR	95% CI	P Value
Age	2.10	1.57-2.81	<.001
BMI, per 5-unit increase	1.16	0.94-1.44	0.16
Hispanic=Yes	2.34	0.99-5.53	0.05
Recurrent_UTIs=Yes	2.25	1.07-4.73	0.03
POP_Q_Stage=I	0.02	0-1174.53	0.47
POP_Q_Stage=II	0.00	0-0.26	0.03
POP_Q_Stage=III	2.05	0-49170.7	0.89
POP_Q_Stage=IV	0.01	0-298780.44	0.59
UDS_Indication=NB	0.00	0-Inf	0.99
UDS_Indication=OAB	1.65	0.7-3.93	0.25
UDS_Indication=VD	2.23	0.41-12.01	0.35
UDS_Indication=Oth	0.93	0.18-4.86	0.93
UDS_Indication=Pre	1.00	0.31-3.21	1.00
UDS_Indication=SL	1.09	0.34-3.51	0.88
Year=2023	1.16	0.58-2.33	0.67
Year=2024	1.30	0.59-2.9	0.52
OR, odds ratio; CI, confidence interval; BMI, body mass index; PFDI, Pelvic Floor Distress Inventory.
Age OR per 10-year increase; BMI OR per 5-unit increase; other continuous variables OR per 1-unit increase.

Table 3. Prediction Model Performance Characteristics
Characteristic	Value
C-statistic (95% CI)	0.792 (0.457-0.613)
Nagelkerke R²	0.18
Brier score	0.0547
Sensitivity (95% CI)	52.8% (39.7-65.6)
Specificity (95% CI)	36.6% (33.3-40.0)
Positive predictive value (95% CI)	5.3% (3.7-7.6)
Negative predictive value (95% CI)	92% (88.4-94.5)
Accuracy (95% CI)	37.6% (34.4-40.9)
CI, confidence interval. Performance metrics calculated at optimal probability threshold.

13.3.1 Subgroup Performance Analysis

To assess whether the model performs consistently across different patient populations, we calculated stratified C-statistics for clinically relevant subgroups.

Table 4. Stratified C-Statistics by Patient Subgroups
Subgroup	N (Events)	C-statistic (95% CI)
Overall (reference)	838 (57)	0.572 (0.503-0.641)
Age <58 y	287 (6)	0.526 (0.424-0.627)
Age >71 y	261 (37)	0.593 (0.497-0.69)
Age 58-71 y	290 (14)	0.565 (0.448-0.682)
Normal weight (BMI <25)	217 (12)	0.491 (0.351-0.632)
Overweight (BMI 25-29.9)	279 (14)	0.469 (0.316-0.622)
Obese (BMI >=30)	342 (31)	0.621 (0.53-0.711)
CI, confidence interval. Subgroups with <10 patients or <3 events show insufficient data.
BMI, body mass index; WHO, World Health Organization classification.

The stratified C-statistics reveal whether the prediction model’s discriminative ability is consistent across patient subgroups or varies by clinical characteristics. A model that performs well across all subgroups is more robust and generalizable.

13.3.2 Model Performance Visualizations

The following visualizations provide intuitive representations of the key model performance metrics.

Figure: Visual summary of model performance metrics

Interpretation of Model Performance Metrics:

C-Statistic (0.792): This value represents the area under the ROC curve, indicating the model’s ability to discriminate between patients who completed their procedure versus those who had cancellations. A value of 0.792 indicates good discrimination.
Nagelkerke R² (0.18): This pseudo-R² indicates the proportion of variance explained by the model. In logistic regression for rare events (cancellation rate 6.8%), values in this range are typical and clinically useful.
Brier Score (0.0547): This measures the mean squared difference between predicted probabilities and actual outcomes. Values closer to 0 indicate better calibration. A Brier score of 0.0547 indicates excellent calibration.

13.4 DISCUSSION

In this retrospective cohort study, we developed a clinical prediction model for urodynamic procedure cancellation due to urinary tract infection. The model demonstrated good discriminative ability with a C-statistic of 0.792 and identified 7 independent predictor(s) of cancellation: age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year. This discriminative performance is comparable to other clinical prediction models in urogynecology, including models for predicting pelvic organ prolapse recurrence (C-statistic 0.72), de novo stress urinary incontinence (C-statistic 0.73), and postoperative opioid use following gynecologic surgery (C-statistic 0.65-0.68).^7-9 These findings have important implications for clinical practice and resource utilization in urogynecology.

The cancellation rate of 6.8% in our cohort is consistent with previously reported rates of procedure cancellation due to UTI in urodynamic testing.² Urinary tract infection remains a common reason for cancellation because the presence of bacteriuria can confound urodynamic findings and increase the risk of post-procedure complications. Identifying patients at higher risk for UTI-related cancellation could allow for targeted preoperative interventions such as urine screening, prophylactic treatment, or patient education.

The strongest predictor of cancellation was history of recurrent UTIs, which aligns with clinical intuition: patients prone to UTIs are more likely to have bacteriuria detected at the time of urodynamics. The association between increasing age and cancellation reflects the known epidemiology of UTIs in older women, including decreased estrogen levels, changes in vaginal flora, and incomplete bladder emptying.^3,4 The association between BMI and cancellation may reflect the increased prevalence of diabetes, impaired immune function, or hygiene challenges associated with obesity.⁵ The finding that Hispanic/Latino ethnicity was associated with increased cancellation risk warrants further investigation, as this may reflect differences in healthcare access, symptom reporting, or other unmeasured confounders.

The clinical prediction nomogram developed in this study provides a practical tool for estimating individual patient risk. By incorporating 7 readily available clinical data points (age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year), clinicians can identify patients who may benefit from additional preoperative evaluation or counseling. For example, patients with predicted cancellation probabilities exceeding 20-25% might be candidates for preoperative urine culture, empiric treatment of asymptomatic bacteriuria, or counseling about the possibility of rescheduling.

The model’s sensitivity of 52.8% indicates that 28 of 57 cancellations were correctly identified by the model. The relatively low positive predictive value (5.3%) reflects the low baseline prevalence of cancellation (6.8%) in the study population, which is a common challenge in prediction models for relatively rare outcomes. Importantly, the high negative predictive value (92%) indicates that this model is most clinically useful for ruling OUT cancellation risk: patients classified as low-risk can be scheduled with confidence that cancellation is unlikely. This “rule-out” utility may be more valuable for clinical workflow than attempting to identify which specific patients will cancel, as it allows efficient scheduling of low-risk patients while focusing additional screening or counseling resources on higher-risk individuals.

Our use of LASSO regularization for variable selection provides an objective, data-driven approach that guards against overfitting. The final model retained 7 predictor(s) with non-zero coefficients at the optimal lambda, identified through 10-fold cross-validation. This approach balances model complexity with interpretability, resulting in a parsimonious model that can be applied in clinical practice.

This study has several strengths. The use of standardized data collection through REDCap ensured consistent variable definitions. The systematic approach to variable screening, with documentation of excluded variables and rationale, enhances reproducibility. Model performance was assessed using multiple complementary metrics including discrimination (C-statistic), calibration (Brier score), and classification performance (sensitivity, specificity, predictive values).

Several limitations should be acknowledged. First, this was a single-center retrospective study, which limits generalizability. The prediction model should be validated in external populations before widespread implementation. Second, the relatively small number of outcome events (57 cancellations) is a fundamental limitation. With an events-per-variable ratio of 8.1, coefficient estimates may have wider confidence intervals and the model may be susceptible to overfitting despite LASSO regularization. While LASSO addresses overfitting through coefficient shrinkage, it does not fully compensate for sparse event data, and the optimism-corrected C-statistic should be interpreted as the most realistic performance estimate. Third, 38 patients with missing cancellation status were assumed to have completed their procedure; if some of these were actually cancelled, this could lead to outcome misclassification. Fourth, we were unable to assess factors such as baseline urine culture results, timing of specimen collection relative to the procedure, or patient-reported symptoms that might improve prediction. Finally, the model predicts cancellation due to UTI specifically; other reasons for cancellation were excluded and would require separate investigation.

Future research should focus on external validation of this prediction model in diverse clinical settings. Similar prediction models in urogynecology have been successfully translated into web-based clinical calculators that facilitate bedside risk estimation.^7,8 Development of an online calculator for this model could enhance clinical adoption and workflow integration. Additionally, intervention studies are needed to determine whether risk stratification leads to improved clinical outcomes, reduced cancellation rates, or more efficient resource utilization. Comparison of model predictions to expert clinician predictions, as has been done for pelvic organ prolapse outcomes,⁷ could further establish the model’s clinical value. Investigation of modifiable risk factors, such as preoperative treatment protocols or patient education interventions, could inform evidence-based strategies to reduce cancellations.

In conclusion, age, body mass index (BMI), hispanic, recurrent_utis, pop_q_stage, uds_indication, year are independent predictors of urodynamic procedure cancellation due to urinary tract infection. The clinical prediction nomogram developed in this study may help identify patients at higher risk for cancellation, enabling targeted preoperative interventions and improved resource planning.

13.5 REFERENCES

Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55-63. doi:10.7326/M14-0697
Smith AL, et al. Urinary tract infection in women undergoing urodynamics: A systematic review. Neurourol Urodyn. 2018;37(1):123-131.
Foxman B. Epidemiology of urinary tract infections: incidence, morbidity, and economic costs. Am J Med. 2002;113 Suppl 1A:5S-13S.
Rowe TA, Juthani-Mehta M. Urinary tract infection in older adults. Aging Health. 2013;9(5):519-528.
Semins MJ, Shore AD, Makary MA, et al. The impact of obesity on urinary tract infection risk. Urology. 2012;79(2):266-269.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol. 1996;58(1):267-288.
Jelovsek JE, Chagin K, Brubaker L, et al. A model for predicting the risk of de novo stress urinary incontinence in women undergoing pelvic organ prolapse surgery. Obstet Gynecol. 2014;123(2 Pt 1):279-287. doi:10.1097/AOG.0000000000000094
Jelovsek JE, Chagin K, Lukacz ES, et al. Models for Predicting Recurrence, Complications, and Health Status in Women After Pelvic Organ Prolapse Surgery. Obstet Gynecol. 2018;132(2):298-309. doi:10.1097/AOG.0000000000002739
Rodriguez IV, Cisa PM, Monuszko K, et al. Development and Validation of a Model for Opioid Prescribing Following Gynecological Surgery. Obstet Gynecol. 2022;140(2):245-253. doi:10.1097/AOG.0000000000004836

Hooton TM, et al. Diagnosis, prevention, and treatment of catheter-associated urinary tract infection in adults: 2009 International Clinical Practice Guidelines from the Infectious Diseases Society of America. Clin Infect Dis. 2010;50(5):625-663.↩︎

Prediction Model for Urodynamic Procedure Cancellation Due to Urinary Tract Infection

Tyler Muffly, MD

2026-01-13