Radiomics and Use of AI for Prognostication

Disclosures

Research grants from ICMR, DBT, MHRD
Actively involved in AI for automatic segmentation (DRAW)
Spearheaded the development of the first oncology image bank in India (CHAVI)

What is prognostication

Prognosis: Estimate of average outcome (cure / survival) in a particular population, time and healthcare context.
Prognostic Factor: Characteristics of the disease, patient, treatment or healthcare system that affects the outcome. A prognostic factor identifies subgroups with similar prognosis.
Prognostic Model: These are statistical models that combine information from various prognostic factors to tailor outcome prediction for individuals.

Radiomics ➡️ Prognostic factors OR Prognostic model

AI ➡️ Prognostic model

Key metrics to understand

Metric	Definition	Measurement
Calibration	How close the model predicted outcome is to the observed outcome	Calibration plots RMSE
Discrimination	How well the model can distinguish between those who do and do not have the outcome	AUC C-index
Clinical utility	Ability of the model to add to the clinical decision making	Net benefit analysis Impact trial

Calibration

How close the model predicted outcome is to the observed outcome

Calibration plots

RMSE

Discrimination

How well the model can distinguish between those who do and do not have the outcome

AUC

C-index

Clinical utility

Ability of the model to add to the clinical decision making

Net benefit analysis

Impact trial

Prognostic models aim to ASSIST clinicians in predicting future outcomes and HELP patients make a more informed choice regarding their treatment.

Relative vs Absolute Risk

Lets say an intervention reduces the risk of mortality by 20%.

5 year survival	95%	90%	80%	60%
N dying out of 100 in 5 yrs	5	10	20	40
Extra People surviving with intervention	1	2	4	8
Survival at 5 years with intervention	96%	92%	84%	68%
Absolute Benefit	1%	2%	4%	8%

Randomized trials give very reliable and stable estimates of relative risks. However the absolute magnitude of benefit from the intervention depends on the baseline risk.

Prognostic models allow you to tell which of the 5 year survival bin your patient will fall allowing you & your patient to decide if you wish to give the intervention or not

Key metrics to understand

Metric	Definition	Measurement
Calibration	How close the model predicted outcome is to the observed outcome	Calibration plots RMSE
Discrimination	How well the model can distinguish between those who do and do not have the outcome	AUC C-index
Clinical utility	Ability of the model to add to the clinical decision making	Net benefit analysis Impact trial

Calibration

How close the model predicted outcome is to the observed outcome

Calibration plots

RMSE

Discrimination

How well the model can distinguish between those who do and do not have the outcome

AUC

C-index

Clinical utility

Ability of the model to add to the clinical decision making

Net benefit analysis

Impact trial

Radiomics : What is it

A branch of computer vision that relies on “extracting” quantitative information from images.

Remember an “image” for a computer is a matrix of numbers.

CT numbers represent the HU (electron density)

MRI numbers represent the proton density

PET numbers represent the radioactivity present

Images encode information

The relationship of the “numbers” in the matrix can be used to infer “patterns”
It is known that these features will often occur in “clusters” which tend to associate with specific tumor features.

Radiomics profile of NSCLC

Types of Radiomics studies

Handcrafted radiomics: These rely on extraction of known “features” from the matrix of numbers and then applying various statistical and machine learning algorithms to derive inference.
Deep-learning based radiomics: These rely on the use of artificial neural networks to extract information from the image without the requirement for “feature” extraction.

Note

A “feature” is a way in which the numerical matrix is summarized.

Features in Radiomics

Feature Type	Description	Examples
Histogram based	Features of the distribution of the matrix of numbers	mean, range
Texture based	Pattern of change in the intensity in adjacent pixels (difference /similarity between adjacent numbers)	Gray level concurrence matrix (GLCM), Gray level run length matrix (GLRLM)
Shape based	Shape described by the numbers of interest	Shape elongation, Mesh surface

Issue with feature extraction

Features are heavily dependent on the scan parameters, machine and the reconstruction algorithm used.
Currently available RTTQA recommendations for image quality are not sufficient for radiomics data quality assurance.

Issues with feature extraction..

In this study, investigators investigated the impact of slice thickness and pixel spacing on radiomic feature variability.
As shown in the image unless these parameters are harmonized, there is significant variability in a large number of radiomic features.

Feature selection

For handcrafted radiomics, more than 100 radiomic features can be extracted from images by common packages.
As a result the dimensionality of the data is >> the number of observations available.
Training models using this high dimensional data results in an overfitted model.
Hence feature selection is often needed.

Penalization and Shrinkage

These are techniques which are designed to prevent overfitting models.
They shrink the predictor effect estimates towards zero and eliminate variables based on that.
As shown in the image to the right > at small sample sizes do the opposite of what they are supposed to do.

Sample size requirements

AI : What is it

Simply put AI - developing machines to mimic human behavior and intelligence - currently we have “Narrow AI” - agents designed to solve a specific problem

Machine learning : Methods which learn from data using “automatic” analytic model building.
Deep learning: A subtype of machine learning which utilizes artificial neural networks to learn from data.

All DL is AI but all AI is not DL

What do we mean by deep learning ?

Input data is processed in multiple “layers” which are connected to each other.

Each layer has ability to recognize / create “features” e.g. shapes, color.
The process learns which feature to place on which level on its own.
The layers transform the data into the “outcome” we are interested in.
Learn high level features from data
Feature extraction as done manually can be eliminated.

Architecture of U-Net a Convolutional Neural Network

Deep learning for prognostication

True deep learning for prognostication are uncommon.
Most use a convolutional neural network where the model is optimized to predict the hazard (this is done through the use of a different loss function.
Performance better than radiomics in limited datasets.

Sources of Bias in AI models

https://www.cdc.gov/pcd/issues/2024/24_0245.htm
Type	Reason	Mitigation
Experience & Expertise Bias	Poor quality data, lack of expertise in algorithm development and familiarity of provider with model during implementation	Development team needs diversity. Data collection with standardized protocols. Ongoing training
Exclusion Bias	Missing or incomplete data or under-representation of population. Access to care for poor / marginalized need special care.	Dataset should be as inclusive as possible. Equity audits to identify population exclusion. Ensuring accessibility of tools for marginalized.

Sources of Bias in AI models ..

https://www.cdc.gov/pcd/issues/2024/24_0245.htm
Type	Reason	Mitigation
Environment Bias	Social and physical environmental data is not included.	Integrate information about the healthcare environment and socio-economic factors.
Empathy Bias	Missing qualitative and human experience in the model dataset. Patient preferences not taken into account	Design needs to be patient centers, reviewed by ethics and ensure appropriate ethical data is included
Evidence Bias	Funding priorities, publication bias as well as well inclusion in evidence base without understanding social context	Diversification of funding, transparent reporting and developing inclusive guidelines

Learning encoded information

17587 radiographs to classify fractures.
Models predicted scanner model, scanner brand and order priority better than predicting fracture
Best predictive ability when image data combined with patient clinical data and hospital process features.
In blackbox DL models these process variables may be unknowingly leveraged as variables for prediction which may lower reliability

Implementation Challenges

Model needs to be integrated with clinical EMR systems
Impact of model on healthcare staff (mental / psychological).
Treatment recommendations and medical decision making also imply medico-legal issues.

How to do it properly

Ensure that you are solving a problem that patient experiences.
Register your study protocol (look at TRIPOD-AI reporting guidelines to understand all elements that you need to incorporate).
Sufficient sample size (more complex algorithm -> greater sample size) (see https://www.prognosisresearch.com/guidance-prognostic-models)
Data needs to be properly curated (avoid missing values, measurement errors and properly annotated outcomes).
Data should be prospectively collected (retrospective data subject to bias that cannot be analyzed away). Ideally a multicenter collaboration.
Validation studies (internal and external). Report calibration. Perform a decision curve analysis.
If using DL models then it is important to be cognizant about issues related to model fairness and representation.
Before deploying perform an impact randomized trial.

Acknowledgements

This presentation is heavily influenced by the extensive work done on prognostic and predictive modelling at the PROGRESS framework website available at https://www.prognosisresearch.com

Conclusion

“We need lesser research, better research and research that makes a difference to patients lives”

-Prof Altman

We invite you to join us at CHAVI and conduct meaningful and generalizable research using radiomics & AI. (https://chavi.ai)