03_EDA_datasets_itu

Author

Sergio Uribe

Modified

June 14, 2024

Packages

Datasets

Data cleaning

How many datasets?

[1] 16 62

From which databases?

Database n percent
Kaggle 3 18.8%
Github 2 12.5%
Google Datasets 2 12.5%
Mendeley 2 12.5%
PubMed 2 12.5%
Zenodo 2 12.5%
Grand-challenge 1 6.2%
OSF 1 6.2%
arXiv 1 6.2%

Year of dataset publication

Year.of.dataset.publication n percent
2020 1 6.2%
2021 2 12.5%
2022 6 37.5%
2023 7 43.8%

Associated with publication?

Paper.associated n percent
No 5 31.2%
Yes 11 68.8%

Areas of research

Value n percent
Teeth segmentation 10 0.3448276
Teeth labeling 9 0.3103448
Caries 3 0.1034483
Oral Pathology 3 0.1034483
Endodontics 1 0.0344828
Cehalometric 1 0.0344828
Endodontics 1 0.0344828
Oral Surgery 1 0.0344828

Imaging modality

Value n percent
Panoramic radiographs 10 58.8%
Cone Beam Computed Tomography (CBCT) 2 11.8%
Intraoral photograph 2 11.8%
Cephalometric radiographs 1 5.9%
Intra-oral 3D scans 1 5.9%
Intraoral radiographs 1 5.9%

Images amount analysis

name sum median average sd
Images…CBCT 557 278.5 278.5 156.3
Images…Intraoral.radiographs 757 188.0 252.3 241.0
Images…Other 3781 925.0 945.2 731.6
Images…Panoramic 5355 180.0 595.0 790.1

Patients per dataset and country

Imaging.modality..multiple.choices. n mean_patients sd_patients
Cephalometric radiographs 1 NA NA
Cone Beam Computed Tomography (CBCT) 2 NA 259.5082
Intra-oral 3D scans 1 NA NA
Intraoral photograph 2 NA NA
Panoramic radiographs 6 NA 444.1859

MAP

Distribution by country

How many countries?

value n percent
CHN 4 22.2%
IRN 2 11.1%
USA 2 11.1%
BEL 1 5.6%
CHE 1 5.6%
ESP 1 5.6%
FRA 1 5.6%
IND 1 5.6%
KOR 1 5.6%
PRY 1 5.6%
SAU 1 5.6%
TUN 1 5.6%
TWN 1 5.6%
Country n percent
China 4 22.2%
Iran 2 11.1%
United States 2 11.1%
Belgium 1 5.6%
Switzerland 1 5.6%
Spain 1 5.6%
France 1 5.6%
India 1 5.6%
South Korea 1 5.6%
Paraguay 1 5.6%
Saudi Arabia 1 5.6%
Tunisia 1 5.6%
Taiwan 1 5.6%

By numbers of images

country n
China 2413
Switzerland 2332
Belgium 1800
France 1800
Iran 1504
United States 1117
Taiwan 600
Tunisia 180
Paraguay 135
India 131
Saudi Arabia 50
South Korea 50
Spain 50

Number of images per repository source

Database n images sd_images img_per_dataset
Kaggle 3 2653 652.5 884.3
Github 2 263 79.9 131.5
Google Datasets 2 1037 567.8 518.5
Mendeley 2 412 36.8 206.0
PubMed 2 1050 671.8 525.0
Zenodo 2 2467 1553.5 1233.5
Grand-challenge 1 600 NA 600.0
OSF 1 1800 NA 1800.0
arXiv 1 168 NA 168.0

Table of images, datasets per imaging modality

name n sum_images sd_images
Images…CBCT 2 557 156.3
Images…Intraoral.radiographs 3 757 241.0
Images…Other 4 3781 731.6
Images…Panoramic 9 5355 790.1

Metadata analysis

name Percentage Yes
Contain annotations 75.0
Associated to paper 68.8
Type of image processing 68.8
Annotation tool reported 66.7
Anatomic segmentation 62.5
Patients number 62.5
Ground truth method explanation 56.2
Annotators experience reported 53.8
Ground truth definition 50.0
Image processing 50.0
Lesion segmentation 50.0
Anonymization strategy 43.8
Equipment used 43.8
Ethical approval 31.2
Patient inclusion/exclusion criteria 31.2
Annotators calibration 18.8
Patient sex distribution 18.8
Annotator dispute handling 16.7
Annotators training 15.4
Annotator age reporting 7.7
Patient consent 6.2
Calibration metric reported 0.0
Patient ethnicity 0.0

Colors for yes/no

FAIR ANALYSIS

name n mean sd
Findable 16 56.2 27.6
Accesible 16 60.4 37.0
Interoperable 16 50.0 37.6
Reusable 16 41.9 19.7

Final Table

Imaging n images
Cephalometric radiographs 1 600
Cone Beam Computed Tomography (CBCT) 2 1088
Intraoral 3D Scans or images 4 4981
Intraoral radiographs 3 150
Panoramic radiographs 9 6263
Imaging n
Cephalometric radiographs 1
Cone Beam Computed Tomography (CBCT) 2
Intraoral 3D Scans or images 4
Intraoral radiographs 3
Panoramic radiographs 9
name n sum mean sd
Images…CBCT 2 557 278.5 156.3
Images…Intraoral.radiographs 3 757 252.3 241.0
Images…Panoramic 9 5355 595.0 790.1
Imaging BEL CHE CHN ESP FRA IND IRN KOR PRY SAU TUN TWN USA
Cephalometric radiographs 0 0 0 0 0 0 0 0 0 0 0 1 0
Cone Beam Computed Tomography (CBCT) 0 0 2 0 0 0 0 0 0 0 0 0 0
Intraoral 3D Scans or images 1 0 1 0 1 1 0 0 0 0 0 0 0
Intraoral radiographs 0 0 0 1 0 0 0 1 0 1 0 0 0
Panoramic radiographs 0 1 2 0 0 0 2 0 1 0 1 0 2
Imaging Accessible.max.3 FAIRness Findable.max.7 Interoperable.max.4 Reusable.max.10
Other 1.8 45.8 3.5 1.6 4.2
Cone Beam Computed Tomography (CBCT) 2.5 65.0 4.8 3.5 5.0
Intraoral radiographs 2.0 75.0 7.0 3.0 6.0
Panoramic radiographs 1.7 48.4 4.1 2.1 3.9