RetinaAI -

Author

Boris

Data loading and preprocessing

Data cleaning

Missing data

  • As indicated by the data preview below, there are observatios with missing values.
Table continues below
No. missing data patient age sex systemic_background
33 7 NA NA NA
33 13 NA NA NA
33 37 NA NA NA
33 38 NA NA NA
33 39 NA NA NA
33 40 NA NA NA
ophthalmic_background symptoms
NA NA
NA NA
NA NA
NA NA
NA NA
NA NA
  • Below is the distribution of the number of missing entries per patient.

  • There is a clear cutoff between many missing values (30 or more) and a few (4 or less).

  • For excluded patients the age is not a number, which corresponds to the frequency of missing values.

Characteristic N = 1191
n_miss_all
    0 2 (1.7%)
    1 22 (18%)
    2 60 (50%)
    3 7 (5.9%)
    4 4 (3.4%)
    31 1 (0.8%)
    32 3 (2.5%)
    33 20 (17%)
1 n (%)
  • The lines in ?@fig-missing indicate that out of n 119 patients, there are

The following plots describe the missingness in the data provided (results 16.5.23.xlsx).

Figure 1: ?(caption)

Figure 2: ?(caption)

Figure 3: ?(caption)

  • check missing by human/gpt.

Wrangling

Sex

Sex is changed to lowercase, and one observation with value HD, S/P MI/ PASCEMAKER is marked NA in the meantime.

Systemic background

I created a dictionary for cleaning the data.

Agreement

  • Is “S/P CVA and hemiparesis” one or two diagnoses?
    • I assumed so.
  • Can we reduce the number of unique values?
Clinical outcome - number of unique values
name chat human
f_u_recommendations_other 6 9
f_u_recommendations_time 9 15
modalities_for_f_u 41 14
most_probable_diagnosis 35 29
protocol 8 9
suggested_treatment 26 10
  • What’s a match? Are OCT and OCTA a mismatch?

Most probable Dx

  • For each patient, we calculate whether the response from Chat was exactly the same as the human’s.

  • How to define “difference” or “distance” between Chat and Human?

  • How to define agreement when multiple diagnoses?

Analysis of Diagnostic Data

Most Probable Diagnosis

  • How do we resolve double/triple diagnoses?

  • What is W/E? Is it like W,E (two diagnoses)? Or a separate level?

Sure, here’s a revised explanation using the abstract diagnosis labels:

  • Full Match: This indicates that the diagnosis generated by the AI (chat) matches entirely with the diagnosis made by the medical professional (human). For instance, if the physician and the AI both diagnose “A, B, C”, it is considered a full match.

  • Partial Match - Human Subset: This scenario happens when all the diagnoses identified by the AI are present in the physician’s diagnosis, but the AI might have missed some conditions. For example, if the physician’s diagnosis is “A, B, C, D”, and the AI’s diagnosis is “A, B, C”, the AI missed condition “D”. This is considered a partial match where the AI’s diagnosis is a subset of the human’s diagnosis.

  • Partial Match - Chat Subset: In this case, the AI identifies all conditions diagnosed by the physician, and possibly more. For instance, if the physician’s diagnosis is “A, B, C”, and the AI’s diagnosis is “A, B, C, D”, the AI has over-diagnosed with condition “D”. This is considered a partial match where the human’s diagnosis is a subset of the AI’s diagnosis.

  • Mismatch: A mismatch occurs when there’s no overlap between the AI’s and the physician’s diagnoses. For example, if the physician diagnoses “A, B, C”, and the AI diagnoses “D, E, F”, there are no common conditions, making it a mismatch.

By employing this methodology, we’re measuring the extent to which the AI’s diagnostic capabilities align with that of the medical professional.

Characteristic Overall, N = 1901 left, N = 951 right, N = 951
diagnosis_match
    chat_partial 3 (1.6%) 1 (1.1%) 2 (2.1%)
    full_match 76 (40%) 39 (41%) 37 (39%)
    human_partial 20 (11%) 8 (8.4%) 12 (13%)
    mismatch 91 (48%) 47 (49%) 44 (46%)
1 n (%)

Suggested Treatment

  • Removed the durATION
Characteristic Overall, N = 1901 left, N = 951 right, N = 951
treatment_match
    chat_partial 25 (13%) 21 (22%) 4 (4.2%)
    full_match 66 (35%) 36 (38%) 30 (32%)
    human_partial 85 (45%) 32 (34%) 53 (56%)
    mismatch 14 (7.4%) 6 (6.3%) 8 (8.4%)
1 n (%)

Protocol

Characteristic Overall, N = 1901 left, N = 951 right, N = 951
protocol_match
    full_match 149 (78%) 76 (80%) 73 (77%)
    human_partial 3 (1.6%) 2 (2.1%) 1 (1.1%)
    mismatch 38 (20%) 17 (18%) 21 (22%)
1 n (%)

Follow-up Recommendations Time

Characteristic N = 951
time_match
    full_match 35 (37%)
    mismatch 60 (63%)
1 n (%)

Follow-up Recommendations Other

Characteristic N = 951
other_match
    chat_partial 23 (24%)
    full_match 50 (53%)
    human_partial 19 (20%)
    mismatch 3 (3.2%)
1 n (%)

Modalities for Follow-up

Characteristic N = 951
modality_match
    human_partial 8 (8.4%)
    mismatch 87 (92%)
1 n (%)