Supervised Learning Tools for Deriving Biomarkers based on Single-Cell Data

Authors
Affiliation

Tingting Zhan

Thomas Jefferson University & Hospitals

Inna Chervoneva

Thomas Jefferson University & Hospitals

Published

October 14, 2025

1 Introduction

This vignette provides examples of using R packages

for deriving single index predictors of scalar outcomes based on spatial and non-spatial single-cell imaging data.

1.1 Prerequisite

Packages hyper.gam (v 0.2.0) require R version 4.5.0 (released 2025-04-11) or higher (macOS, Windows, Linux).

Environment on author’s computer
Sys.info()[c('sysname', 'release', 'machine')]
#>  sysname  release  machine 
#> "Darwin" "25.1.0"  "arm64"
R.version
#>                _                           
#> platform       aarch64-apple-darwin20      
#> arch           aarch64                     
#> os             darwin20                    
#> system         aarch64, darwin20           
#> status                                     
#> major          4                           
#> minor          5.1                         
#> year           2025                        
#> month          06                          
#> day            13                          
#> svn rev        88306                       
#> language       R                           
#> version.string R version 4.5.1 (2025-06-13)
#> nickname       Great Square Root
utils::install.packages('groupedHyperframe')
utils::install.packages('hyper.gam')

Readers are encouraged to learn more about the full details of package hyper.gam in Appendix Section 6.1. Appendix Section 6.1.1 provides additional details about the development environment and tools. Appendix Section 6.1.2 provides an overview of how package hyper.gam extends the functionality of the packages mgcv (Wood 2017, v1.9.3). Appendix Section 6.1.3 outlines the package dependencies of hyper.gam. Appendix Section 6.1.5 outlines the procedures for installing the development versions of package hyper.gam.

1.2 Getting Started

Examples in this vignette require that the search path has

library(groupedHyperframe)
library(hyper.gam)
#> Registered S3 method overwritten by 'pROC':
#>   method   from            
#>   plot.roc spatstat.explore
library(survival)

Appendix Section 6.1.4 explains the function name conflicts indicated by the Registered S3 method overwritten message.

1.3 Acknowledgement

This work is supported by National Institutes of Health, U.S. Department of Health and Human Services grants

The authors thank

  • Erjia Cui’s contribution to function hyper_gam().

2 Data Structure

Single-cell multiplex immuno-fluorescence immunohistochemistry (mIF-IHC) imaging data are the result of digital processing of the microscopic images of tissue stained with selected antibodies. Quantitative pathology platforms, e.g., Akoya or QuPath, support cell segmentation of mIF-IHC images and quantification of the mean protein expression in each cell. The cell centroid coordinates and cell signal intensities (CSIs) for each stained protein are usually extracted as individual comma-separated values .csv files. For each cell in a tissue image, the data include the cell centroid coordinates and cell signal intensity (CSI) for each quantified protein expression. The data may have multiple levels of hierarchical clustering. For example, single cells are clustered within a Region of Interest (ROI) or a tissue core, ROIs are clustered within a tissue or tissue cores are clustered within a patient.

3 Quantile Index (Zhan, Yi, and Chervoneva 2025)

To cite the implementation of the Quantile Index (QI) methodology, please use

Zhan T, Yi M, Chervoneva I (2025). “Quantile Index predictors using R package hyper.gam.” Bioinformatics, 41(8), btaf430. ISSN 1367-4811, doi:10.1093/bioinformatics/btaf430 https://doi.org/10.1093/bioinformatics/btaf430.

BibTeX and/or BibLaTeX entries for LaTeX users
Warning in citation(package = "groupedHyperframe"): could not determine year
for 'groupedHyperframe' from package DESCRIPTION file
@Article{,
  title = {Quantile Index predictors using R package `hyper.gam`},
  author = {Tingting Zhan and Misung Yi and Inna Chervoneva},
  journal = {Bioinformatics},
  volume = {41},
  number = {8},
  pages = {btaf430},
  year = {2025},
  month = {07},
  issn = {1367-4811},
  doi = {10.1093/bioinformatics/btaf430},
}

@Manual{,
  title = {groupedHyperframe: Grouped Hyper Data Frame: An Extension of
Hyper Data Frame},
  author = {Tingting Zhan and Inna Chervoneva},
  note = {R package version 0.3.0},
  url = {https://github.com/tingtingzhan/groupedHyperframe},
}

as well as Yi et al. (2023b); Yi et al. (2023a); Yi et al. (2025).

3.1 Compute Aggregated Quantiles

We use the data example Ki67 from package groupedHyperframe in this non-spatial application.

Ki67q = groupedHyperframe::Ki67 |>
  within.data.frame(expr = {
    x = y = NULL # remove x- and y-coords for non-spatial application
  }) |>
  as.groupedHyperframe(group = ~ patientID/tissueID) |>
  quantile(probs = seq.int(from = .01, to = .99, by = .01)) |>
  aggregate(by = ~ patientID)

The returned object Ki67q is a hyper data frame with

  • a numeric-hypercolumn of the aggregated sample quantiles logKi67.quantile per patientID. These quantiles are calculated at a pre-specified grid of probabilities \{p_k, k=1,\cdots,K \} \in [0,1]. Note that the aggregation must be performed at the level of biologically independent clusters, e.g., ~patientID, to produce independent quantile predictors.
  • Metadata, including the outcome of interest, e.g., progression free survival PFS, Her2, HR, etc.
A hyper data frame Ki67q: aggregated quantiles
Ki67q |>
  head()
#> Hyperframe:
#>   Tstage  PFS adj_rad adj_chemo histology  Her2   HR  node  race age patientID
#> 1      2 100+   FALSE     FALSE         3  TRUE TRUE  TRUE White  66   PT00037
#> 2      1   22   FALSE     FALSE         3 FALSE TRUE FALSE Black  42   PT00039
#> 3      1  99+   FALSE        NA         3 FALSE TRUE FALSE White  60   PT00040
#> 4      1  99+   FALSE      TRUE         3 FALSE TRUE  TRUE White  53   PT00042
#> 5      1  112    TRUE      TRUE         3 FALSE TRUE  TRUE White  52   PT00054
#> 6      4   12    TRUE     FALSE         2  TRUE TRUE  TRUE Black  51   PT00059
#>   logKi67.quantile
#> 1        (numeric)
#> 2        (numeric)
#> 3        (numeric)
#> 4        (numeric)
#> 5        (numeric)
#> 6        (numeric)

Users are encouraged to learn more about

Sections in package groupedHyperframe vignette
Function Purpose Section
as.groupedHyperframe() to convert a data frame into a grouped hyper data frame 2
quantile() to calculate the quantiles of each numeric-vector in the numeric-hypercolumn 2
aggregate() to aggregate the quantiles over multiple tissues per patient by point-wise means 2

3.2 Estimate Integrand Surface

Linear quantile index (QI) (Equation 1) is a predictor in a functional generalized linear model (James 2002) for outcomes from the exponential family of distributions, or a linear functional Cox model (Gellar et al. 2015) for survival outcomes,

\text{QI}_{i}=\int_{0}^{1} \beta(p)Q_i(p)dp \tag{1}

where Q_i(p) is the (aggregated) sample quantiles logKi67.quantile for the i-th subject, and \beta(p) is the unknown coefficient function to be estimated. We use function hyper_gam() to fit a generalized additive model gam with integrated linear spline-based smoothness estimation (function mgcv::s(), Wood 2003). This is a scalar-on-function model (Reiss et al. 2017) that predicts a scalar outcome (e.g., progression free survival time PFS[,1L]) using the aggregated quantiles function as a functional predictor.

m0 = hyper_gam(PFS ~ logKi67.quantile, data = Ki67q)

Nonlinear quantile index (nlQI) (Equation 2) is a predictor in the functional generalized additive model (McLean et al. 2014) for outcomes from the exponential family of distributions, or an additive functional Cox model (Cui, Crainiceanu, and Leroux 2021) for survival outcomes.

\text{nlQI}_{i}= \int_{0}^{1} F\big(p, Q_i(p)\big)dp \tag{2}

where F(\cdot,\cdot) is an unknown bivariate twice differentiable function. We use function hyper_gam(., nonlinear = TRUE) to fit a generalized additive model gam with tensor product interaction estimation (function mgcv::ti(), Wood 2006).

m1 = hyper_gam(PFS ~ logKi67.quantile, data = Ki67q, nonlinear = TRUE)

3.2.1 S3 Class hyper_gam

The fitted functional model m0 and m1 have the S3 class hyper_gam, which inherits from the S3 class gam from package mgcv (Wood 2017). The hyper_gam class has an additional attribute,

  • attr(., 'xname'), a symbol of the hypercolumn name.

Such inheritance enables the use of all S3 method dispatches on gam objects from package mgcv on the hyper_gam objects.

3.2.2 Visualization

Function integrandSurface() creates an interactive htmlwidget (Vaidyanathan et al. 2023) visualization of the estimated integrand surfaces for the linear (Equation 1) or nonlinear quantile index (Equation 2) using package plotly (Sievert 2020). The integrand surfaces, defined on p\in[0,1] and q\in\text{range}\big\{Q_i(p), i=1,\cdots,n\big\}, are

\hat{S}(p,q) = \begin{cases} \hat{\beta}(p)\cdot q\\ \hat{F}(p,q) \end{cases} \tag{3}

Also in this interactive visualization are

  • the contour lines on the integrand surfaces (Equation 3), as well as their projections along the s-axis, i.e., onto the (p,q)-plane (a.k.a., the “floor”);
  • the estimated linear integrand paths \hat{\beta}(p)Q_i(p) or the nonlinear integrand paths \hat{F}(p, Q_i(p)) on the integrand surfaces (Equation 3);
  • the sample quantiles Q_i(p), i.e., the projections of the estimated linear or nonlinear integrand path along the s-axis, i.e., onto the (p,q)-plane (a.k.a., the “floor”);
  • the projections of the estimated linear or nonlinear integrand path along the q-axis, i.e., onto the (p,s)-plane (a.k.a., the “backwall”), so that the area under each projected path is equal to the estimated linear (Equation 1) or nonlinear quantile index (Equation 2).

Figure 1 is a collage (via package htmltools, Cheng et al. 2024) of the interactive htmlwidget visualizations of the linear and nonlinear integrand surfaces, contours, integrand paths and their projections to the “floor” and “backwall”. Users should remove the argument n in function call integrandSurface(, n=21L), and use the default n=501L instead, for a more refined surface. The authors must use n=21L to reduce the htmlwidget objects size, in order to comply with CRAN and/or RPubs file size limit.

m0 |> integrandSurface(n = 21L)
m1 |> integrandSurface(n = 21L)
R code: collage multiple htmlwidgets
htmltools::tagList(
  m0 |> integrandSurface(n = 21L), 
  m1 |> integrandSurface(n = 21L)
) |> 
  htmltools::browsable()
Figure 1: Linear (top) and nonlinear (bottom) integrand surfaces, contours, integrand paths and their projections to the “floor” and “backwall”

Static illustrations of the estimated integrand surfaces, e.g., the perspective (S3 method dispatch persp.hyper_gam()) and contour (S3 method dispatch contour.hyper_gam()) plots, are produced by calling the S3 generics persp() and contour() in package graphics shipped with vanilla R. These static figures are suppressed to reduce the file size of this vignette.

m0 |> persp() # a static figure
m0 |> contour() # a static figure
m1 |> persp() # a static figure
m1 |> contour() # a static figure

Visualization of the integrand surface (Equation 3) in functions integrandSurface(), persp.hyper_gam() and contour.hyper_gam() is inspired by function mgcv::vis.gam(). Visualization of the integrand paths, as well as their projections on the (p,q)- and (p,s)-plane, is an original idea and design by Tingting Zhan.

3.3 Compute Quantile Index Predictor

Linear and nonlinear quantile indices are the predictors in the functional models (Equation 1) and (Equation 2), respectively. Let’s consider a conventional scenario that we first fit a hyper_gam model to the training data set, then compute the quantile index predictors in the training and/or test data set using the training model.

First, we partition the 622 patients in hyper data frame Ki67q into a training data set with 498 patients and a test data set with 124 patients, i.e., a 80% vs. 20% partition.

set.seed(16); id = Ki67q |> nrow() |> seq_len() |> caret::createDataPartition(p = .8)
Ki67q_0 = Ki67q[id[[1L]],] # training set
Ki67q_1 = Ki67q[-id[[1L]],] # test set

Next, we fit a functional generalized additive model to the the training data set Ki67q_0,

m1a = hyper_gam(PFS ~ logKi67.quantile, nonlinear = TRUE, data = Ki67q_0)

The S3 method dispatch hyper.gam::predict.hyper_gam() calculates the quantile index predictors of the training and/or test data set, based on the training model m1a. The returned value is a numeric vector. This is a convenient wrapper and slight modification of the function mgcv::predict.gam(). The use of S3 generic stats::predict(), which is typically for predicted values, could be confusing, but we choose to follow the practice and nomenclature of function mgcv::predict.gam().

We can, but we should not, use the quantile index predictors of the training data set for downstream analysis, because these quantile index predictors are optimized on the training data set and the results would be optimistically biased.

Optimistically biased!!
Ki67q_0[,c('PFS', 'age', 'race')] |> 
  as.data.frame() |> # invokes spatstat.geom::as.data.frame.hyperframe()
  data.frame(nlQI = predict(m1a, newdata = Ki67q_0)) |>
  coxph(formula = PFS ~ age + nlQI, data = _)
#> Call:
#> coxph(formula = PFS ~ age + nlQI, data = data.frame(as.data.frame(Ki67q_0[, 
#>     c("PFS", "age", "race")]), nlQI = predict(m1a, newdata = Ki67q_0)))
#> 
#>           coef exp(coef)  se(coef)      z       p
#> age  -0.023320  0.976950  0.008321 -2.803 0.00507
#> nlQI  1.118713  3.060911  0.343152  3.260 0.00111
#> 
#> Likelihood ratio test=23.24  on 2 df, p=8.993e-06
#> n= 498, number of events= 99

Instead, we should use the quantile index predictors computed in the test data set for downstream analysis,

Ki67q_1[,c('PFS', 'age', 'race')] |> 
  as.data.frame() |> # invokes spatstat.geom::as.data.frame.hyperframe()
  data.frame(nlQI = predict(m1a, newdata = Ki67q_1)) |>
  coxph(formula = PFS ~ age + nlQI, data = _)
#> Call:
#> coxph(formula = PFS ~ age + nlQI, data = data.frame(as.data.frame(Ki67q_1[, 
#>     c("PFS", "age", "race")]), nlQI = predict(m1a, newdata = Ki67q_1)))
#> 
#>          coef exp(coef) se(coef)      z     p
#> age  -0.01636   0.98377  0.01862 -0.879 0.380
#> nlQI  1.21050   3.35516  0.75858  1.596 0.111
#> 
#> Likelihood ratio test=3.42  on 2 df, p=0.1805
#> n= 124, number of events= 19

4 🚧 Spatial Index for Point Pattern with Continuous Marks

🚧🚧 This section is under construction 🚧🚧

Spatial version of groupedHyperframe

Ki67s = groupedHyperframe::Ki67 |>
  grouped_ppp(formula = logKi67 ~ . | patientID/tissueID, data = _)
Spatial groupedHyperframe Ki67s
Ki67s
#> Grouped Hyperframe: ~patientID/tissueID
#> 
#> 645 tissueID nested in
#> 622 patientID
#> 
#> Preview of first 10 (or less) rows:
#> 
#>    patientID tissueID Tstage  PFS adj_rad adj_chemo histology  Her2   HR  node
#> 1    PT00037 TJUe_I17      2 100+   FALSE     FALSE         3  TRUE TRUE  TRUE
#> 2    PT00039 TJUe_G17      1   22   FALSE     FALSE         3 FALSE TRUE FALSE
#> 3    PT00040 TJUe_F17      1  99+   FALSE        NA         3 FALSE TRUE FALSE
#> 4    PT00042 TJUe_D17      1  99+   FALSE      TRUE         3 FALSE TRUE  TRUE
#> 5    PT00054 TJUe_J18      1  112    TRUE      TRUE         3 FALSE TRUE  TRUE
#> 6    PT00059 TJUe_N17      4   12    TRUE     FALSE         2  TRUE TRUE  TRUE
#> 7    PT00062 TJUe_J17      2  64+   FALSE     FALSE         3 FALSE TRUE  TRUE
#> 8    PT00068 TJUe_F19      2  56+   FALSE     FALSE         2  TRUE TRUE  TRUE
#> 9    PT00082 TJUe_P19      2  79+   FALSE     FALSE         3  TRUE TRUE FALSE
#> 10   PT00084 TJUe_O19      2   26   FALSE      TRUE         2  TRUE TRUE FALSE
#>     race age  ppp.
#> 1  White  66 (ppp)
#> 2  Black  42 (ppp)
#> 3  White  60 (ppp)
#> 4  White  53 (ppp)
#> 5  White  52 (ppp)
#> 6  Black  51 (ppp)
#> 7  Asian  50 (ppp)
#> 8  White  37 (ppp)
#> 9  White  68 (ppp)
#> 10 Black  55 (ppp)

Recommended r_\text{max}.

.rmax = Ki67s |> .rmax(fun = 'K')
#> Default rmax for hypercolumn ppp. are 645⨯ 404.00
Ki67.E = Ki67s |>
  Emark_(r = seq.int(from = 0, to = 180, by = 1), correction = 'none') |>
  summary_fv() |>
  aggregate(by = ~ patientID, fun = pmean)
#> 
Ki67.E |> head()
#> Hyperframe:
#>   patientID Tstage  PFS adj_rad adj_chemo histology  Her2   HR  node  race age
#> 1   PT00037      2 100+   FALSE     FALSE         3  TRUE TRUE  TRUE White  66
#> 2   PT00039      1   22   FALSE     FALSE         3 FALSE TRUE FALSE Black  42
#> 3   PT00040      1  99+   FALSE        NA         3 FALSE TRUE FALSE White  60
#> 4   PT00042      1  99+   FALSE      TRUE         3 FALSE TRUE  TRUE White  53
#> 5   PT00054      1  112    TRUE      TRUE         3 FALSE TRUE  TRUE White  52
#> 6   PT00059      4   12    TRUE     FALSE         2  TRUE TRUE  TRUE Black  51
#>   logKi67.E.y logKi67.E.cumtrapz logKi67.E.cumvtrapz      ppp. logKi67.E
#> 1   (numeric)          (numeric)           (numeric) (ppplist)  (fvlist)
#> 2   (numeric)          (numeric)           (numeric) (ppplist)  (fvlist)
#> 3   (numeric)          (numeric)           (numeric) (ppplist)  (fvlist)
#> 4   (numeric)          (numeric)           (numeric) (ppplist)  (fvlist)
#> 5   (numeric)          (numeric)           (numeric) (ppplist)  (fvlist)
#> 6   (numeric)          (numeric)           (numeric) (ppplist)  (fvlist)
x0 = hyper_gam(PFS ~ logKi67.E.cumvtrapz, data = Ki67.E)
x1 = hyper_gam(PFS ~ logKi67.E.cumvtrapz, data = Ki67.E, nonlinear = TRUE)
library(htmltools)
tagList(
  x0 |> integrandSurface(n = 21L), 
  x1 |> integrandSurface(n = 21L)
) |> 
  browsable()
# figure suppressed to reduce vignette size

5 🚧 Spatial Index for Point Pattern with Multi-Type Marks

🚧🚧 This section is under construction 🚧🚧

6 Appendix

6.1 About

R terminology may differ from that commonly used in mathematics and statistics. For explanations and references to the terms and abbreviations used in this vignette, please refer to Section 7.

6.1.1 Environment

This vignette is created under R version 4.5.1 (2025-06-13) with the aid of packages knitr (Xie 2025, v1.50; 2015; 2014), quarto (Allaire and Dervieux 2024, v1.5.1 with Quarto v1.8.25) and rmarkdown (Allaire et al. 2025, v2.30; Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020). Documentation and check are supported by packages roxygen2 (Wickham, Danenberg, et al. 2025, v7.3.3) and devtools (Wickham, Hester, et al. 2025, v2.4.6).

An Integrated Development Environment (IDE), e.g., RStudio (Posit Team 2025) or Positron, is not required, but highly recommended.

6.1.2 Enhancement

Package hyper.gam (v0.2.0) Enhances the package mgcv (Wood 2017, v1.9.3), with

  • gam on hyper-column

6.1.3 Dependency

Package hyper.gam Imports packages

Package hyper.gam Suggests packages

Package groupedHyperframe (CRAN, RPubs, v0.3.0) dependencies are outlined in its separate vignette.

6.1.4 Function Name Conflicts

As of package spatstat.explore (Baddeley, Rubak, and Turner 2015, v3.5.3.3) and pROC (Robin et al. 2011, v1.19.0.1), we have a function name clash between spatstat.explore::plot.roc() and pROC::plot.roc() when loading both packages groupedHyperframe (which Imports package spatstat.explore) and hyper.gam (which Imports package caret which Imports package pROC). This function name clash is potentially hazardous as the S3 classes spatstat.explore::roc and pROC::roc are totally different.

6.1.5 Development Version

The development version of package hyper.gam, which includes experimental and potentially unstable features, is released extremely frequently to Github.

Readers should repeat the calls to function pak::pak() (Csárdi and Hester 2025) until each call prints the message

ℹ No downloads are needed

Review: function pak::pak() (Csárdi and Hester 2025), install development versions
pak::pak('tingtingzhan/hyper.gam', ask = FALSE)

Alternatively, readers should repeat the calls to function remotes::install_github() (Csárdi et al. 2024) until each call prints the message,

Skipping install of '_PACKAGE_' from a github remote, the SHA1 (_0000000d_) has not changed since last install.

Review: function remotes::install_github() (Csárdi et al. 2024), install development versions
remotes::install_github('tingtingzhan/hyper.gam', upgrade = 'always')

Installation instructions for the development version of package groupedHyperframe are provided in a separate vignette.

7 Terms & Abbreviations

Table 1 presents the terms and abbreviations from imported packages that are used in this vignette, which is folded to avoid overwhelming the reader with its excessive length.

Table 1: Terms & Abbreviations in imported packages
Table 1: Terms & Abbreviations in R
Term / Abbreviation Description
createDataPartition Test vs. training data set partition, from package caret (Kuhn 2008)
coxph Cox proportional hazards model, from package survival (Therneau 2024)
gam Generalized additive models (GAM), from package mgcv (Wood 2017)
groupedHyperframe Grouped hyper data frame, from package groupedHyperframe (Zhan and Chervoneva 2025a)
hypercolumns, hyperframe (Hyper columns of) hyper data frame, from package spatstat.geom (Baddeley and Turner 2005)
htmlwidget HyperText Markup Language (HTML) widgets, from package htmlwidgets (Vaidyanathan et al. 2023), https://www.htmlwidgets.org, https://plotly.com/r/getting-started/
marks, marked Marks of a point pattern
mgcv::s (Set up of) spline based smooths (Wood 2003)
mgcv::ti Tensor product interaction (Wood 2006)
PFS Progression/recurrence free survival, https://en.wikipedia.org/wiki/Progression-free_survival
ppp, ppp.object (Marked) point pattern
predict.gam GAM model predictor
Surv Survival, i.e., time-to-event, object, from package survival (Therneau 2024)

A comprehensive glossary of R terms and abbreviations used in this vignette can be found at

https://rpubs.com/tingtingzhan/R_terms

8 References

Allaire, JJ, and Christophe Dervieux. 2024. quarto: R Interface to ‘Quarto‘ Markdown Publishing System. https://CRAN.R-project.org/package=quarto.
Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2025. rmarkdown: Dynamic Documents for R. https://github.com/rstudio/rmarkdown.
Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. London: Chapman; Hall/CRC Press. https://www.routledge.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/p/book/9781482210200/.
Baddeley, Adrian, and Rolf Turner. 2005. spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software 12 (6): 1–42. https://doi.org/10.18637/jss.v012.i06.
Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2024. htmltools: Tools for HTML. https://CRAN.R-project.org/package=htmltools.
Csárdi, Gábor. 2025. cli: Helpers for Developing Command Line Interfaces. https://CRAN.R-project.org/package=cli.
Csárdi, Gábor, and Jim Hester. 2025. pak: Another Approach to Package Installation. https://CRAN.R-project.org/package=pak.
Csárdi, Gábor, Jim Hester, Hadley Wickham, Winston Chang, Martin Morgan, and Dan Tenenbaum. 2024. remotes: R Package Installation from Remote Repositories, Including ’GitHub’. https://CRAN.R-project.org/package=remotes.
Cui, Erjia, Ciprian M. Crainiceanu, and Andrew Leroux. 2021. “Additive Functional Cox Model.” Journal of Computational and Graphical Statistics 30 (3): 780–93. https://doi.org/10.1080/10618600.2020.1853550.
Gellar, Jonathan E., Elizabeth Colantuoni, Dale M. Needham, and Ciprian M. Crainiceanu. 2015. “Cox Regression Models with Functional Covariates for Survival Data.” Statistical Modelling 15 (3): 256–78. https://doi.org/10.1177/1471082X14565526.
James, Gareth M. 2002. “Generalized Linear Models with Functional Predictors.” Journal of the Royal Statistical Society Series B: Statistical Methodology 64 (3): 411–32. https://doi.org/10.1111/1467-9868.00342.
Kuhn, Max. 2008. “Building Predictive Models in R Using the ‘Caret‘ Package.” Journal of Statistical Software 28 (5): 1–26. https://doi.org/10.18637/jss.v028.i05.
McLean, Mathew W., Giles Hooker, Ana-Maria Staicu, Fabian Scheipl, and David Ruppert. 2014. “Functional Generalized Additive Models.” Journal of Computational and Graphical Statistics 23 (1): 249–69. https://doi.org/10.1080/10618600.2012.729985.
Pinheiro, José, Douglas Bates, and R Core Team. 2025. nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.
Posit Team. 2025. RStudio: Integrated Development Environment for R. Boston, MA: Posit Software, PBC. https://posit.co/.
Reiss, Philip T., Jeff Goldsmith, Han Lin Shang, and R. Todd Ogden. 2017. “Methods for Scalar-on-Function Regression.” International Statistical Review 85 (2): 228–49. https://doi.org/10.1111/insr.12163.
Robin, Xavier, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean-Charles Sanchez, and Markus Müller. 2011. pROC: An Open-Source Package for R and S+ to Analyze and Compare ROC Curves.” BMC Bioinformatics 12: 77. https://doi.org/10.1186/1471-2105-12-77.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, ‘Plotly‘, and ‘Shiny‘. Chapman; Hall/CRC. https://plotly-r.com.
Therneau, Terry M. 2024. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival.
Vaidyanathan, Ramnath, Yihui Xie, JJ Allaire, Joe Cheng, Carson Sievert, and Kenton Russell. 2023. htmlwidgets: HTML Widgets for R. https://CRAN.R-project.org/package=htmlwidgets.
Wickham, Hadley, Peter Danenberg, Gábor Csárdi, and Manuel Eugster. 2025. roxygen2: In-Line Documentation for R. https://CRAN.R-project.org/package=roxygen2.
Wickham, Hadley, Jim Hester, Winston Chang, and Jennifer Bryan. 2025. devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.
Wood, Simon N. 2003. “Thin-Plate Regression Splines.” Journal of the Royal Statistical Society B: Statistical Methodology 65 (1): 95–114. https://doi.org/10.1111/1467-9868.00374.
———. 2006. “Low-Rank Scale-Invariant Tensor Product Smooths for Generalized Additive Mixed Models.” Biometrics 62 (4): 1025–36. https://doi.org/10.1111/j.1541-0420.2006.00574.x.
———. 2017. Generalized Additive Models: An Introduction with R. 2nd ed. Chapman; Hall/CRC.
Xie, Yihui. 2014. knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.
Yi, Misung, Tingting Zhan, Amy R. Peck, Jeffrey A. Hooke, Albert J. Kovatich, Craig D. Shriver, Hai Hu, Yunguang Sun, Hallgeir Rui, and Inna Chervoneva. 2023a. “Quantile Index Biomarkers Based on Single-Cell Expression Data.” Laboratory Investigation 103 (8): 100158. https://doi.org/10.1016/j.labinv.2023.100158.
———. 2023b. “Selection of Optimal Quantile Protein Biomarkers Based on Cell-Level Immunohistochemistry Data.” BMC Bioinformatics 24 (1): 298. https://doi.org/10.1186/s12859-023-05408-8.
Yi, Misung, Tingting Zhan, Hallgeir Rui, and Inna Chervoneva. 2025. “Functional Protein Biomarkers Based on Distributions of Expression Levels in Single-Cell Imaging Data.” Bioinformatics 41 (5): btaf182. https://doi.org/10.1093/bioinformatics/btaf182.
Zhan, Tingting, and Inna Chervoneva. 2025a. groupedHyperframe: Grouped Hyper Data Frame: An Extension of Hyper Data Frame. https://CRAN.R-project.org/package=groupedHyperframe.
———. 2025b. hyper.gam: Generalized Additive Models with Hyper Column. https://CRAN.R-project.org/package=hyper.gam.
Zhan, Tingting, Misung Yi, and Inna Chervoneva. 2025. “Quantile Index Predictors Using r Package ‘Hyper.gam‘.” Bioinformatics 41 (8): btaf430. https://doi.org/10.1093/bioinformatics/btaf430.