Challanges, solutions & opportunities
Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova
Randomised Controlled Trials(RCTs) are considered the gold standard for causal inference
However, RCTs are not always feasible
Converesly, Observational studies are characterized by the observation and systematic recording of natural occurrences.
Observation of phenomena as they naturally exist.
Insights without disrupting their natural flow.
Multiple data sources for observational studies exist.
Among these, 3 important sources can be acknowledged:
Electronic Health Records (EHRs)
Aggregate Data
Individual Data Registries
EHRs are digital repositories of patients’ health information and contain a wealth of data
Aggregate data involves summarized information collected from various sources.
Observational studies using aggregate data can focus on epidemiological analyses, public health surveillance, or studying trends across diverse populations or regions.
Individual data registries contain detailed information about a specific population, condition, or disease.
Addressing biases in observational studies involves multiple themes:

While it’s challenging to completely eliminate biases in observational studies, researchers aim to minimize their effects to ensure the validity and reliability of their findings.
A solution for minimizing such biases must necessarily consider multiple aspects.
Within this scenario, the N/P ratio plays a fundamental role.
The “N/P ratio” refers to the ratio between the sample size (N) and the number of predictors or features (P) used in a statistical model.
The N/P ratio influences the model’s ability to generalize from the sample to the larger population and affects the stability and accuracy of the statistical estimates.
In EHRs, usually the n/p ratio is big.
A big N/P ratio refers to a scenario in statistical analysis where the number of samples (N) is significantly larger than the number of predictors or features (P) in a dataset.
However, “high” N/P ratio high N/P ratios might not always be optimal
In particular:
Conversely, N/P ratio in aggregate data is often small.
A small N/P ratio refers to a scenario in statistical analysis where the number of samples (N) is relatively small compared to the number of predictors or features (P) in a dataset. As opposite of high N/P ratio:
Finally, it is possible to refer to N/P ratio in individual data registry as a deep N/P ratio.
The aim of this thesis is to explore the possibilities that arise from observational studies, using 4 different projects to illustrate the main concept
Addressing a small N/P ratio involves strategic approaches to mitigate limitations and enhance the reliability of analyses.
Systematic reviews with meta-analysis represent a possible statistical answer in case of small N/P ratio.
In collaboration with the surgical team at the Padova hospital, we carried out a systematic review of currently existing predictive models to forecast mortality in critical care unit after ECMO iniziation.
PubMed, CINAHL, Embase, MEDLINE and Scopus were consulted.
Most models have not been validated externally and uncertainty if ECMO should be initiated or not remains.
It has yet to be determined whether and to what extent a new methodological perspective may enhance the performance of predictive models for ECMO.
Having a large N/P ratio can be beneficial for machine learning techniques, especially in enhancing generalization and reducing overfitting.
In 2022, with the help of PEDIANET, we tried to develop a machine learning approach in order to exploit Electronic health records (EHRs) as a source of real-world health data.
This work discusses the development and application of a deep learning model, particularly a recurrent neural network with gated recurrent units (RNN-GRU), for the automatic extraction of information from EHRs to assess VZV infections in children.
In presence of high N/P ratio, deep-recurrent neural network architectures offer several advantages:
Gold standard:
ML approach:
10 models were trained in total:
Final dissertation