Bootstrap Optimism Correction

Author

Tingting Zhan

Published

August 25, 2025

1 Introduction

This vignette provides examples of using R packages

to ….

R terminology might be different from that of mathematics and statistics. Please refer to Appendix Section 7 for explanation and reference of the terms and abbreviations used in this vignette.

Package boc Imports packages

Package boc Suggests packages

Package maxEff dependencies are outlined in its separate vignette (CRAN, RPubs).

1.1 Prerequisite

Packages boc and maxEff require R version 4.5.0 (released 2025-04-11) or higher (macOS, Windows). An Integrated Development Environment (IDE), e.g., RStudio (Posit team 2025) or Positron, is not required, but highly recommended. This vignette is created under R version 4.5.1 (2025-06-13) using packages knitr (Xie 2025, v1.50), quarto (Allaire and Dervieux 2024, v1.5.0 with Quarto v1.7.33) and rmarkdown (Allaire et al. 2024, v2.29).

Environment on author’s computer
Sys.info()[c('sysname', 'release', 'machine')]
#  sysname  release  machine 
# "Darwin" "24.6.0"  "arm64"
R.version
#                _                           
# platform       aarch64-apple-darwin20      
# arch           aarch64                     
# os             darwin20                    
# system         aarch64, darwin20           
# status                                     
# major          4                           
# minor          5.1                         
# year           2025                        
# month          06                          
# day            13                          
# svn rev        88306                       
# language       R                           
# version.string R version 4.5.1 (2025-06-13)
# nickname       Great Square Root

Experimental (and maybe unstable) features are released extremely frequently to Github. Active developers should use the Github version; suggestions and bug reports are welcome! Stable releases to CRAN are typically updated every 2 to 3 months, or when the authors have an upcoming manuscript in the peer-reviewing process.

remotes::install_github('tingtingzhan/maxEff')
remotes::install_github('tingtingzhan/boc')
Developers, do NOT use the CRAN version!
# utils::install.packages('boc') # Developers, do NOT use!!
utils::install.packages('maxEff') # Developers, do NOT use!!

1.2 Getting Started

Examples in this vignette require that the search path has

library(boc)
# Registered S3 method overwritten by 'spatstat.explore':
#   method   from
#   plot.roc pROC
library(survival)

For the function name clash between spatstat.explore::plot.roc() and pROC::plot.roc(), see detailed explanation in package maxEff vignette (RPubs, CRAN).

1.3 Acknowledgement

This work is supported by National Institutes of Health, U.S. Department of Health and Human Services grants

2 Example

data(flchain, package = 'survival')
flchain2 = flchain |> 
  subset.data.frame(subset = (futime > 0)) |> # required by ?rpart::rpart
  subset.data.frame(subset = (chapter == 'Circulatory')) |>
  within.data.frame(expr = {
    mgus = as.logical(mgus)
    OS = Surv(futime, death)
    chapter = futime = death = NULL
  })
dim(flchain2) # 742
# [1] 742   9
m0 = coxph(OS ~ age + creatinine, data = flchain2)
A coxph model m0
m0
# Call:
# coxph(formula = OS ~ age + creatinine, data = flchain2)
# 
#                coef exp(coef) se(coef)     z        p
# age        0.031451  1.031950 0.004506 6.979 2.97e-12
# creatinine 0.402371  1.495366 0.051557 7.804 5.98e-15
# 
# Likelihood ratio test=84.07  on 2 df, p=< 2.2e-16
# n= 673, number of events= 673 
#    (69 observations deleted due to missingness)
nobs(m0) # 673, due to missingness in `creatinine`
# [1] 673

3 add_dummies()

m1 = m0 |>
  add_dummies(formula = ~ kappa + lambda)
m1
# kappa :
# function (newx = kappa) 
# {
#     ret <- (newx >= 2.455)
#     ret0 <- na.omit(ret)
#     if ((length(ret0) > 1L) && (all(ret0) || !any(ret0))) 
#         warning("Dichotomized values are all-0 or all-1")
#     return(ret)
# }
# <environment: 0x123e60ac0>
# attr(,"class")
# [1] "node1"    "function"
# 
# lambda :
# function (newx = lambda) 
# {
#     ret <- (newx >= 2.215)
#     ret0 <- na.omit(ret)
#     if ((length(ret0) > 1L) && (all(ret0) || !any(ret0))) 
#         warning("Dichotomized values are all-0 or all-1")
#     return(ret)
# }
# <environment: 0x123eb7f50>
# attr(,"class")
# [1] "node1"    "function"
m1 |>
  sapply(FUN = maxEff::get_cutoff)
#  kappa lambda 
#  2.455  2.215

4 boot_rule()

set.seed(143); m2 = m1 |>
  boot_rule(R = 30L)
stopifnot(length(m2) == 30L)
m2[[1L]] # rule of 1st bootstrap
# kappa :
# function (newx = kappa) 
# {
#     ret <- (newx >= 2.355)
#     ret0 <- na.omit(ret)
#     if ((length(ret0) > 1L) && (all(ret0) || !any(ret0))) 
#         warning("Dichotomized values are all-0 or all-1")
#     return(ret)
# }
# <environment: 0x13a8c73f8>
# attr(,"class")
# [1] "node1"    "function"
# 
# lambda :
# function (newx = lambda) 
# {
#     ret <- (newx >= 2.07)
#     ret0 <- na.omit(ret)
#     if ((length(ret0) > 1L) && (all(ret0) || !any(ret0))) 
#         warning("Dichotomized values are all-0 or all-1")
#     return(ret)
# }
# <environment: 0x13a8c4f28>
# attr(,"class")
# [1] "node1"    "function"
Cut-off values in m2
m2 |>
  lapply(FUN = \(i) sapply(i, FUN = maxEff::get_cutoff)) |>
  do.call(rbind, args = _)
#       kappa lambda
#  [1,] 2.355  2.070
#  [2,] 2.435  3.675
#  [3,] 2.425  2.325
#  [4,] 2.425  2.325
#  [5,] 2.570  2.085
#  [6,] 2.115  2.110
#  [7,] 2.150  2.215
#  [8,] 2.435  2.085
#  [9,] 2.435  2.215
# [10,] 2.195  3.590
# [11,] 2.370  2.170
# [12,] 2.455  3.595
# [13,] 2.715  3.675
# [14,] 2.465  2.140
# [15,] 2.465  2.105
# [16,] 2.455  2.065
# [17,] 2.410  2.215
# [18,] 2.460  2.215
# [19,] 2.465  2.215
# [20,] 2.135  4.380
# [21,] 2.370  2.135
# [22,] 2.405  2.105
# [23,] 2.425  2.315
# [24,] 3.110  2.275
# [25,] 2.440  2.215
# [26,] 2.570  3.685
# [27,] 2.455  2.325
# [28,] 2.345  2.215
# [29,] 2.425  2.155
# [30,] 2.385  2.065

5 boot_optimism()

set.seed(143); m3 = m1 |>
  boot_optimism(R = 30L)
head(m3) # just a matrix
#              [,1]         [,2]
# [1,]  0.082387339 -0.055144173
# [2,] -0.053245156  0.300264517
# [3,]  0.110567264 -0.008108603
# [4,] -0.009161665  0.141728259
# [5,]  0.139101392 -0.093180230
# [6,]  0.219917915  0.094292093

6 boc()

set.seed(143); m4 = m1 |>
boc(R = 30L)
m4 |> 
  summary()
# Call:
# coxph(formula = OS ~ age + creatinine + kappa + lambda, data = data)
# 
#   n= 673, number of events= 673 
# 
#                coef exp(coef) se(coef)     z Pr(>|z|)    
# age        0.028132  1.028532 0.004541 6.195 5.81e-10 ***
# creatinine 0.285354  1.330233 0.063708 4.479 7.49e-06 ***
# kappaTRUE  0.375753  1.456087 0.120115 3.128  0.00176 ** 
# lambdaTRUE 0.108684  1.114810 0.102173 1.064  0.28745    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
#            exp(coef) exp(-coef) lower .95 upper .95
# age            1.029     0.9723    1.0194     1.038
# creatinine     1.330     0.7517    1.1741     1.507
# kappaTRUE      1.456     0.6868    1.1507     1.843
# lambdaTRUE     1.115     0.8970    0.9125     1.362
# 
# Concordance= 0.62  (se = 0.012 )
# Likelihood ratio test= 110.2  on 4 df,   p=<2e-16
# Wald test            = 121.9  on 4 df,   p=<2e-16
# Score (logrank) test = 126.5  on 4 df,   p=<2e-16

7 Terms & Abbreviations

Term / Abbreviation Description
CRAN, R The Comprehensive R Archive Network, https://cran.r-project.org
Depends, Imports, Suggests, Enhances Writing R Extensions, Section 1.1.3 Package Dependencies
|> Forward pipe operator introduced since R 4.1.0
:: Explicitly-namespaced function or object
addmargins Add margins to arrays
as.environment Convert an object to an environment
abs Absolute value
call Unevaluated expression
coxph Cox proportional hazards model
createDataPartition Test vs. training data set partition, from package caret (Kuhn 2008)
duplicated Duplicate elements
emptyenv Empty environment
environment Environment
eval Evaluate an expression
factor Factor, or categorical variable
formals Formal arguments
closure, function R function
globalenv, .GlobalEnv Global environment
groupedHyperframe Grouped hyper data frame, from package groupedHyperframe (Zhan and Chervoneva 2025a)
head First parts of an object
hypercolumns, hyperframe (Hyper columns of) hyper data frame, from package spatstat.geom (Baddeley and Turner 2005)
inherits Class inheritance
labels Labels from object
levels Levels of a factor
list2env Convert a list to environment
listof List of objects
logistic Logistic regression model, stats::glm(., family = binomial('logit'))
matrix Matrix
median Median value
parent.env Parent environment
PFS Progression/recurrence free survival, https://en.wikipedia.org/wiki/Progression-free_survival
predict Model prediction
quantile Quantile
rpart, rpart.object, node Recursive partitioning and regression trees
S3, generic, methods S3 object oriented system, UseMethod; getS3method; https://adv-r.hadley.nz/s3.html
S4, generic, methods S4 object oriented system, isS4; setClass; setMethod; getMethod; https://adv-r.hadley.nz/s4.html
sort_by Sort an object by some criterion
str2lang To parse R expressions
subset Subsets of object by conditions
suppressWarnings Suppress warning messages
Surv Survival, i.e., time-to-event, object, from package survival (T. M. Therneau 2024)
table Cross tabulation
update Update and re-fit a model call

8 References

Allaire, JJ, and Christophe Dervieux. 2024. quarto: R Interface to ’Quarto’ Markdown Publishing System. https://doi.org/10.32614/CRAN.package.quarto.
Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2024. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Angelo Canty, and B. D. Ripley. 2024. boot: Bootstrap R (S-Plus) Functions.
Baddeley, Adrian, and Rolf Turner. 2005. spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software 12 (6): 1–42. https://doi.org/10.18637/jss.v012.i06.
Bengtsson, Henrik. 2025. matrixStats: Functions That Apply to Rows and Columns of Matrices (and to Vectors). https://doi.org/10.32614/CRAN.package.matrixStats.
Kuhn, Max. 2008. “Building Predictive Models in R Using the caret Package.” Journal of Statistical Software 28 (5): 1–26. https://doi.org/10.18637/jss.v028.i05.
Posit team. 2025. RStudio: Integrated Development Environment for r. Boston, MA: Posit Software, PBC. http://www.posit.co/.
Therneau, Terry M. 2024. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival.
Therneau, Terry, and Beth Atkinson. 2025. rpart: Recursive Partitioning and Regression Trees. https://doi.org/10.32614/CRAN.package.rpart.
Xie, Yihui. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Zhan, Tingting, and Inna Chervoneva. 2025a. groupedHyperframe: Grouped Hyper Data Frame: An Extension of Hyper Data Frame. https://doi.org/10.32614/CRAN.package.groupedHyperframe.
———. 2025b. maxEff: Additional Predictor with Maximum Effect Size. https://doi.org/10.32614/CRAN.package.maxEff.