What is it and how not to do it !
Tata Medical Center, Kolkata, WB
Radiomics ➡️ Prognostic factors OR Prognostic model
AI ➡️ Prognostic model
Metric | Definition | Measurement |
---|---|---|
Calibration | How close the model predicted outcome is to the observed outcome | Calibration plots RMSE |
Discrimination | How well the model can distinguish between those who do and do not have the outcome | AUC C-index |
Clinical utility | Ability of the model to add to the clinical decision making | Net benefit analysis Impact trial |
Prognostic models aim to ASSIST clinicians in predicting future outcomes and HELP patients make a more informed choice regarding their treatment.
Lets say an intervention reduces the risk of mortality by 20%.
5 year survival | 95% | 90% | 80% | 60% |
---|---|---|---|---|
N dying out of 100 in 5 yrs | 5 | 10 | 20 | 40 |
Extra People surviving with intervention | 1 | 2 | 4 | 8 |
Survival at 5 years with intervention | 96% | 92% | 84% | 68% |
Absolute Benefit | 1% | 2% | 4% | 8% |
Randomized trials give very reliable and stable estimates of relative risks. However the absolute magnitude of benefit from the intervention depends on the baseline risk.
Prognostic models allow you to tell which of the 5 year survival bin your patient will fall allowing you & your patient to decide if you wish to give the intervention or not
Metric | Definition | Measurement |
---|---|---|
Calibration | How close the model predicted outcome is to the observed outcome | Calibration plots RMSE |
Discrimination | How well the model can distinguish between those who do and do not have the outcome | AUC C-index |
Clinical utility | Ability of the model to add to the clinical decision making | Net benefit analysis Impact trial |
A branch of computer vision that relies on “extracting” quantitative information from images.
Remember an “image” for a computer is a matrix of numbers.
CT numbers represent the HU (electron density)
MRI numbers represent the proton density
PET numbers represent the radioactivity present
Handcrafted radiomics: These rely on extraction of known “features” from the matrix of numbers and then applying various statistical and machine learning algorithms to derive inference.
Deep-learning based radiomics: These rely on the use of artificial neural networks to extract information from the image without the requirement for “feature” extraction.
Note
A “feature” is a way in which the numerical matrix is summarized.
Feature Type | Description | Examples |
---|---|---|
Histogram based | Features of the distribution of the matrix of numbers | mean, range |
Texture based | Pattern of change in the intensity in adjacent pixels (difference /similarity between adjacent numbers) | Gray level concurrence matrix (GLCM), Gray level run length matrix (GLRLM) |
Shape based | Shape described by the numbers of interest | Shape elongation, Mesh surface |
In this study, investigators investigated the impact of slice thickness and pixel spacing on radiomic feature variability.
As shown in the image unless these parameters are harmonized, there is significant variability in a large number of radiomic features.
For handcrafted radiomics, more than 100 radiomic features can be extracted from images by common packages.
As a result the dimensionality of the data is >> the number of observations available.
Training models using this high dimensional data results in an overfitted model.
Hence feature selection is often needed.
These are techniques which are designed to prevent overfitting models.
They shrink the predictor effect estimates towards zero and eliminate variables based on that.
As shown in the image to the right > at small sample sizes do the opposite of what they are supposed to do.
Simply put AI - developing machines to mimic human behavior and intelligence - currently we have “Narrow AI” - agents designed to solve a specific problem
Machine learning : Methods which learn from data using “automatic” analytic model building.
Deep learning: A subtype of machine learning which utilizes artificial neural networks to learn from data.
All DL is AI but all AI is not DL
Each layer has ability to recognize / create “features” e.g. shapes, color.
The process learns which feature to place on which level on its own.
The layers transform the data into the “outcome” we are interested in.
Learn high level features from data
Feature extraction as done manually can be eliminated.
Type | Reason | Mitigation |
---|---|---|
Experience & Expertise Bias | Poor quality data, lack of expertise in algorithm development and familiarity of provider with model during implementation | Development team needs diversity. Data collection with standardized protocols. Ongoing training |
Exclusion Bias | Missing or incomplete data or under-representation of population. Access to care for poor / marginalized need special care. | Dataset should be as inclusive as possible. Equity audits to identify population exclusion. Ensuring accessibility of tools for marginalized. |
Type | Reason | Mitigation |
---|---|---|
Environment Bias | Social and physical environmental data is not included. | Integrate information about the healthcare environment and socio-economic factors. |
Empathy Bias | Missing qualitative and human experience in the model dataset. Patient preferences not taken into account | Design needs to be patient centers, reviewed by ethics and ensure appropriate ethical data is included |
Evidence Bias | Funding priorities, publication bias as well as well inclusion in evidence base without understanding social context | Diversification of funding, transparent reporting and developing inclusive guidelines |
17587 radiographs to classify fractures.
Models predicted scanner model, scanner brand and order priority better than predicting fracture
Best predictive ability when image data combined with patient clinical data and hospital process features.
In blackbox DL models these process variables may be unknowingly leveraged as variables for prediction which may lower reliability
Model needs to be integrated with clinical EMR systems
Impact of model on healthcare staff (mental / psychological).
Treatment recommendations and medical decision making also imply medico-legal issues.
Ensure that you are solving a problem that patient experiences.
Register your study protocol (look at TRIPOD-AI reporting guidelines to understand all elements that you need to incorporate).
Sufficient sample size (more complex algorithm -> greater sample size) (see https://www.prognosisresearch.com/guidance-prognostic-models)
Data needs to be properly curated (avoid missing values, measurement errors and properly annotated outcomes).
Data should be prospectively collected (retrospective data subject to bias that cannot be analyzed away). Ideally a multicenter collaboration.
Validation studies (internal and external). Report calibration. Perform a decision curve analysis.
If using DL models then it is important to be cognizant about issues related to model fairness and representation.
Before deploying perform an impact randomized trial.
This presentation is heavily influenced by the extensive work done on prognostic and predictive modelling at the PROGRESS framework website available at https://www.prognosisresearch.com
“We need lesser research, better research and research that makes a difference to patients lives”
-Prof Altman
We invite you to join us at CHAVI and conduct meaningful and generalizable research using radiomics & AI. (https://chavi.ai)