RetinaAI -

Author

Boris

Data loading and preprocessing

As indicated by the data preview below, there are observatios with missing values.

Table continues below
No. missing data	patient	age	sex	systemic_background
33	7	NA	NA	NA
33	13	NA	NA	NA
33	37	NA	NA	NA
33	38	NA	NA	NA
33	39	NA	NA	NA
33	40	NA	NA	NA

Below is the distribution of the number of missing entries per patient.
There is a clear cutoff between many missing values (30 or more) and a few (4 or less).
For excluded patients the age is not a number, which corresponds to the frequency of missing values.

The following plots describe the missingness in the data provided (results 16.5.23.xlsx).

Sex is changed to lowercase, and one observation with value HD, S/P MI/ PASCEMAKER is marked NA in the meantime.

I created a dictionary for cleaning the data.

For each patient, we calculate whether the response from Chat was exactly the same as the human’s.
How to define “difference” or “distance” between Chat and Human?

Sure, here’s a revised explanation using the abstract diagnosis labels:

Full Match: This indicates that the diagnosis generated by the AI (chat) matches entirely with the diagnosis made by the medical professional (human). For instance, if the physician and the AI both diagnose “A, B, C”, it is considered a full match.
Partial Match - Human Subset: This scenario happens when all the diagnoses identified by the AI are present in the physician’s diagnosis, but the AI might have missed some conditions. For example, if the physician’s diagnosis is “A, B, C, D”, and the AI’s diagnosis is “A, B, C”, the AI missed condition “D”. This is considered a partial match where the AI’s diagnosis is a subset of the human’s diagnosis.
Partial Match - Chat Subset: In this case, the AI identifies all conditions diagnosed by the physician, and possibly more. For instance, if the physician’s diagnosis is “A, B, C”, and the AI’s diagnosis is “A, B, C, D”, the AI has over-diagnosed with condition “D”. This is considered a partial match where the human’s diagnosis is a subset of the AI’s diagnosis.
Mismatch: A mismatch occurs when there’s no overlap between the AI’s and the physician’s diagnoses. For example, if the physician diagnoses “A, B, C”, and the AI diagnoses “D, E, F”, there are no common conditions, making it a mismatch.

By employing this methodology, we’re measuring the extent to which the AI’s diagnostic capabilities align with that of the medical professional.

Characteristic	Overall, N = 190¹	left, N = 95¹	right, N = 95¹
diagnosis_match
chat_partial	3 (1.6%)	1 (1.1%)	2 (2.1%)
full_match	76 (40%)	39 (41%)	37 (39%)
human_partial	20 (11%)	8 (8.4%)	12 (13%)
mismatch	91 (48%)	47 (49%)	44 (46%)
¹ n (%)

Characteristic	Overall, N = 190¹	left, N = 95¹	right, N = 95¹
treatment_match
chat_partial	25 (13%)	21 (22%)	4 (4.2%)
full_match	66 (35%)	36 (38%)	30 (32%)
human_partial	85 (45%)	32 (34%)	53 (56%)
mismatch	14 (7.4%)	6 (6.3%)	8 (8.4%)
¹ n (%)

Characteristic	Overall, N = 190¹	left, N = 95¹	right, N = 95¹
protocol_match
full_match	149 (78%)	76 (80%)	73 (77%)
human_partial	3 (1.6%)	2 (2.1%)	1 (1.1%)
mismatch	38 (20%)	17 (18%)	21 (22%)
¹ n (%)