Radiomics and Use of AI for Prognostication

What is it and how not to do it !

Dr Santam Chakraborty

Tata Medical Center, Kolkata, WB

Disclosures

  • Research grants from ICMR, DBT, MHRD
  • Actively involved in AI for automatic segmentation (DRAW)
  • Spearheaded the development of the first oncology image bank in India (CHAVI)

What is prognostication

  • Prognosis: Estimate of average outcome (cure / survival) in a particular population, time and healthcare context.
  • Prognostic Factor: Characteristics of the disease, patient, treatment or healthcare system that affects the outcome. A prognostic factor identifies subgroups with similar prognosis.
  • Prognostic Model: These are statistical models that combine information from various prognostic factors to tailor outcome prediction for individuals.

Radiomics ➡️ Prognostic factors OR Prognostic model

AI ➡️ Prognostic model

Key metrics to understand

Metric Definition Measurement
Calibration How close the model predicted outcome is to the observed outcome

Calibration plots

RMSE

Discrimination How well the model can distinguish between those who do and do not have the outcome

AUC

C-index

Clinical utility Ability of the model to add to the clinical decision making

Net benefit analysis

Impact trial

Prognostic models aim to ASSIST clinicians in predicting future outcomes and HELP patients make a more informed choice regarding their treatment.

Relative vs Absolute Risk

Lets say an intervention reduces the risk of mortality by 20%.

5 year survival 95% 90% 80% 60%
N dying out of 100 in 5 yrs 5 10 20 40
Extra People surviving with intervention 1 2 4 8
Survival at 5 years with intervention 96% 92% 84% 68%
Absolute Benefit 1% 2% 4% 8%

Randomized trials give very reliable and stable estimates of relative risks. However the absolute magnitude of benefit from the intervention depends on the baseline risk.

Prognostic models allow you to tell which of the 5 year survival bin your patient will fall allowing you & your patient to decide if you wish to give the intervention or not

Key metrics to understand

Metric Definition Measurement
Calibration How close the model predicted outcome is to the observed outcome

Calibration plots

RMSE

Discrimination How well the model can distinguish between those who do and do not have the outcome

AUC

C-index

Clinical utility Ability of the model to add to the clinical decision making

Net benefit analysis

Impact trial

Radiomics : What is it

A branch of computer vision that relies on “extracting” quantitative information from images.

Remember an “image” for a computer is a matrix of numbers.

CT numbers represent the HU (electron density)

MRI numbers represent the proton density

PET numbers represent the radioactivity present

Images encode information

  • The relationship of the “numbers” in the matrix can be used to infer “patterns”

  • It is known that these features will often occur in “clusters” which tend to associate with specific tumor features.

Radiomics profile of NSCLC

Radiomics profile of NSCLC

Types of Radiomics studies

  • Handcrafted radiomics: These rely on extraction of known “features” from the matrix of numbers and then applying various statistical and machine learning algorithms to derive inference.

  • Deep-learning based radiomics: These rely on the use of artificial neural networks to extract information from the image without the requirement for “feature” extraction.

Note

A “feature” is a way in which the numerical matrix is summarized.

Features in Radiomics

Feature Type Description Examples
Histogram based Features of the distribution of the matrix of numbers mean, range
Texture based Pattern of change in the intensity in adjacent pixels (difference /similarity between adjacent numbers) Gray level concurrence matrix (GLCM), Gray level run length matrix (GLRLM)
Shape based Shape described by the numbers of interest Shape elongation, Mesh surface

Issue with feature extraction

  • Features are heavily dependent on the scan parameters, machine and the reconstruction algorithm used.

  • Currently available RTTQA recommendations for image quality are not sufficient for radiomics data quality assurance.

Variability in PET imaging parameters

Variability in PET imaging parameters

Issues with feature extraction..

  • In this study, investigators investigated the impact of slice thickness and pixel spacing on radiomic feature variability.

  • As shown in the image unless these parameters are harmonized, there is significant variability in a large number of radiomic features.

Feature selection

  • For handcrafted radiomics, more than 100 radiomic features can be extracted from images by common packages.

  • As a result the dimensionality of the data is >> the number of observations available.

  • Training models using this high dimensional data results in an overfitted model.

  • Hence feature selection is often needed.

Overfitted model

Overfitted model

Penalization and Shrinkage

  • These are techniques which are designed to prevent overfitting models.

  • They shrink the predictor effect estimates towards zero and eliminate variables based on that.

  • As shown in the image to the right > at small sample sizes do the opposite of what they are supposed to do.

Effect of sample size

Sample size requirements

AI : What is it

Simply put AI - developing machines to mimic human behavior and intelligence - currently we have “Narrow AI” - agents designed to solve a specific problem

  • Machine learning : Methods which learn from data using “automatic” analytic model building.

  • Deep learning: A subtype of machine learning which utilizes artificial neural networks to learn from data.

All DL is AI but all AI is not DL

AI <> ML <> DL

What do we mean by deep learning ?

  • Input data is processed in multiple “layers” which are connected to each other.
  • Each layer has ability to recognize / create “features” e.g. shapes, color.

  • The process learns which feature to place on which level on its own.

  • The layers transform the data into the “outcome” we are interested in.

  • Learn high level features from data

  • Feature extraction as done manually can be eliminated.

Architecture of U-Net a Convolutional Neural Network

Deep learning for prognostication

  • True deep learning for prognostication are uncommon.

  • Most use a convolutional neural network where the model is optimized to predict the hazard (this is done through the use of a different loss function.

  • Performance better than radiomics in limited datasets.

Schon et al, Scientific Reports, 2024

Schon et al, Scientific Reports, 2024

Sources of Bias in AI models

https://www.cdc.gov/pcd/issues/2024/24_0245.htm
Type Reason Mitigation
Experience & Expertise Bias Poor quality data, lack of expertise in algorithm development and familiarity of provider with model during implementation Development team needs diversity. Data collection with standardized protocols. Ongoing training
Exclusion Bias Missing or incomplete data or under-representation of population. Access to care for poor / marginalized need special care. Dataset should be as inclusive as possible. Equity audits to identify population exclusion. Ensuring accessibility of tools for marginalized.

Sources of Bias in AI models ..

https://www.cdc.gov/pcd/issues/2024/24_0245.htm
Type Reason Mitigation
Environment Bias Social and physical environmental data is not included. Integrate information about the healthcare environment and socio-economic factors.
Empathy Bias Missing qualitative and human experience in the model dataset. Patient preferences not taken into account Design needs to be patient centers, reviewed by ethics and ensure appropriate ethical data is included
Evidence Bias Funding priorities, publication bias as well as well inclusion in evidence base without understanding social context Diversification of funding, transparent reporting and developing inclusive guidelines

Learning encoded information

  • 17587 radiographs to classify fractures.

  • Models predicted scanner model, scanner brand and order priority better than predicting fracture

  • Best predictive ability when image data combined with patient clinical data and hospital process features.

  • In blackbox DL models these process variables may be unknowingly leveraged as variables for prediction which may lower reliability

Implementation Challenges

  • Model needs to be integrated with clinical EMR systems

  • Impact of model on healthcare staff (mental / psychological).

  • Treatment recommendations and medical decision making also imply medico-legal issues.

How to do it properly

  • Ensure that you are solving a problem that patient experiences.

  • Register your study protocol (look at TRIPOD-AI reporting guidelines to understand all elements that you need to incorporate).

  • Sufficient sample size (more complex algorithm -> greater sample size) (see https://www.prognosisresearch.com/guidance-prognostic-models)

  • Data needs to be properly curated (avoid missing values, measurement errors and properly annotated outcomes).

  • Data should be prospectively collected (retrospective data subject to bias that cannot be analyzed away). Ideally a multicenter collaboration.

  • Validation studies (internal and external). Report calibration. Perform a decision curve analysis.

  • If using DL models then it is important to be cognizant about issues related to model fairness and representation.

  • Before deploying perform an impact randomized trial.

Acknowledgements

This presentation is heavily influenced by the extensive work done on prognostic and predictive modelling at the PROGRESS framework website available at https://www.prognosisresearch.com

Conclusion

“We need lesser research, better research and research that makes a difference to patients lives”

-Prof Altman

We invite you to join us at CHAVI and conduct meaningful and generalizable research using radiomics & AI. (https://chavi.ai)

Join us at AROICON 2025 in Kolkata