Data 622 Assignment 4

Regensburg Pediatric Appendicitis

This dataset originates from a retrospective study involving pediatric patients who were admitted to Children’s Hospital St. Hedwig in Regensburg, Germany, with symptoms of abdominal pain. For most patients, multiple B-mode abdominal ultrasound images were collected, with the number of views ranging from 1 to 15. These images capture key anatomical areas such as the right lower quadrant, appendix, intestines, lymph nodes, and reproductive organs. In addition to the ultrasound images, the dataset includes comprehensive clinical data such as laboratory results, physical examination findings, clinical scoring metrics (e.g., Alvarado and pediatric appendicitis scores), and expert interpretations of the ultrasound scans. Each patient is also labeled according to three clinical outcomes: final diagnosis (appendicitis vs. no appendicitis), treatment approach (surgical vs. conservative), and disease severity (complicated vs. uncomplicated or no appendicitis). The study received approval from the Ethics Committee of the University of Regensburg (reference numbers 18-1063-101, 18-1063_1-101, and 18-1063_2-101) and was conducted in accordance with all applicable ethical guidelines and regulations.

Here’s a refined and cohesive rewrite of the feature categories and variable descriptions:

🔍 Feature Overview

The dataset is structured into several clinically relevant feature groups that collectively support diagnostic modeling for pediatric appendicitis:

1. Demographics & Vital Signs

These variables capture basic patient characteristics and physiological measurements:

Age, Sex, BMI – General demographic data
Height, Weight – Physical growth metrics
Body_Temperature – Core body temperature (in °C)

2. Clinical Diagnosis

Information about diagnostic stages and clinical decisions:

Diagnosis_Presumptive – Initial clinical impression prior to imaging or lab testing
Diagnosis – Final confirmed diagnosis
Severity – Classification of appendicitis as uncomplicated or complicated
Management – Treatment approach (surgical vs. conservative)

3. Clinical Scoring Systems

Standardized scores used to assess likelihood of appendicitis:

Alvarado_Score – General clinical appendicitis score
Pediatric_Appendicitis_Score – Pediatric-specific risk assessment tool

4. Laboratory Findings

Includes both blood and urine-based diagnostic markers:

Blood Tests – e.g., WBC_Count, CRP, Hemoglobin, Neutrophil_Percentage
Urine Tests – e.g., WBC_in_Urine, RBC_in_Urine, Ketones_in_Urine

5. Symptoms & Clinical Signs

Binary indicators representing patient-reported symptoms or physical exam findings:

Examples include: Migratory_Pain, Nausea, Coughing_Pain, Peritonitis, Dysuria, etc.

6. Ultrasound & Imaging Features

Ultrasound results and radiologic signs suggestive of appendicitis:

Structural Findings – Appendix_on_US, Appendix_Diameter, US_Performed
Complications – Perforation, Appendicolith, Appendicular_Abscess, among others

Loading library and data sources

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
regensburg_pediatric_appendicitis = fetch_ucirepo(id=938) 
  
# data (as pandas dataframes) 
X = regensburg_pediatric_appendicitis.data.features 
y = regensburg_pediatric_appendicitis.data.targets 
  
# metadata 
print(regensburg_pediatric_appendicitis.metadata)

## {'uci_id': 938, 'name': 'Regensburg Pediatric Appendicitis', 'repository_url': 'https://archive.ics.uci.edu/dataset/938/regensburg+pediatric+appendicitis', 'data_url': 'https://archive.ics.uci.edu/static/public/938/data.csv', 'abstract': 'This repository holds the data from a cohort of pediatric patients with suspected appendicitis admitted with abdominal pain to Children’s Hospital St. Hedwig in Regensburg, Germany, between 2016 and 2021. Each patient has (potentially multiple) ultrasound (US) images, aka views, tabular data comprising laboratory, physical examination, scoring results and ultrasonographic findings extracted manually by the experts, and three target variables, namely, diagnosis, management and severity.', 'area': 'Health and Medicine', 'tasks': ['Classification'], 'characteristics': ['Tabular', 'Image'], 'num_instances': 782, 'num_features': 53, 'feature_types': ['Real', 'Categorical', 'Integer'], 'demographics': ['Age', 'Sex'], 'target_col': ['Management', 'Severity', 'Diagnosis'], 'index_col': None, 'has_missing_values': 'yes', 'missing_values_symbol': 'NaN', 'year_of_dataset_creation': 2023, 'last_updated': 'Tue Feb 06 2024', 'dataset_doi': '10.5281/zenodo.7669442', 'creators': ['Ricards Marcinkevics', 'Patricia Reis', 'Ugne Klimiene', 'Ece Ozkan', 'Kieran Chin-Cheong', 'Alyssia Paschke', 'Julia Zerres', 'Markus Denzinger', 'David Niederberger', 'S. Wellmann', 'C. Knorr', 'Julia E.'], 'intro_paper': {'ID': 354, 'type': 'NATIVE', 'title': 'Interpretable and Intervenable Ultrasonography-based Machine Learning Models for Pediatric Appendicitis', 'authors': 'Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ugne Klimiene, Ece Ozkan, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, S. Wellmann, C. Knorr, Julia E. Vogt', 'venue': 'Medical Image Analysis', 'year': 2023, 'journal': None, 'DOI': None, 'URL': 'https://arxiv.org/abs/2302.14460v2', 'sha': None, 'corpus': None, 'arxiv': None, 'mag': None, 'acl': None, 'pmid': None, 'pmcid': None}, 'additional_info': {'summary': 'This dataset was acquired in a retrospective study from a cohort of pediatric patients admitted with abdominal pain to Children’s Hospital St. Hedwig in Regensburg, Germany. Multiple abdominal B-mode ultrasound images were acquired for most patients, with the number of views varying from 1 to 15. The images depict various regions of interest, such as the abdomen’s right lower quadrant, appendix, intestines, lymph nodes and reproductive organs. Alongside multiple US images for each subject, the dataset includes information encompassing laboratory tests, physical examination results, clinical scores, such as Alvarado and pediatric appendicitis scores, and expert-produced ultrasonographic findings. Lastly, the subjects were labeled w.r.t. three target variables: diagnosis (appendicitis vs. no appendicitis), management (surgical vs. conservative) and severity (complicated vs. uncomplicated or no appendicitis). The study was approved by the Ethics Committee of the University of Regensburg (no. 18-1063-101, 18-1063_1-101 and 18-1063_2-101) and was performed following applicable guidelines and regulations.', 'purpose': None, 'funded_by': None, 'instances_represent': None, 'recommended_data_splits': None, 'sensitive_data': None, 'preprocessing_description': None, 'variable_info': None, 'citation': None}, 'external_url': 'https://zenodo.org/records/7669442'}

  
# variable information 
print(regensburg_pediatric_appendicitis.variables)

##                                 name     role  ...  units missing_values
## 0                                Age  Feature  ...  years            yes
## 1                                BMI  Feature  ...   None            yes
## 2                                Sex  Feature  ...   None            yes
## 3                             Height  Feature  ...   None            yes
## 4                             Weight  Feature  ...   None            yes
## 5                     Length_of_Stay  Feature  ...   None            yes
## 6                         Management   Target  ...   None            yes
## 7                           Severity   Target  ...   None            yes
## 8              Diagnosis_Presumptive    Other  ...   None            yes
## 9                          Diagnosis   Target  ...   None            yes
## 10                    Alvarado_Score  Feature  ...   None            yes
## 11     Paedriatic_Appendicitis_Score  Feature  ...   None            yes
## 12                    Appendix_on_US  Feature  ...   None            yes
## 13                 Appendix_Diameter  Feature  ...   None            yes
## 14                    Migratory_Pain  Feature  ...   None            yes
## 15              Lower_Right_Abd_Pain  Feature  ...   None            yes
## 16  Contralateral_Rebound_Tenderness  Feature  ...   None            yes
## 17                     Coughing_Pain  Feature  ...   None            yes
## 18                            Nausea  Feature  ...   None            yes
## 19                  Loss_of_Appetite  Feature  ...   None            yes
## 20                  Body_Temperature  Feature  ...   None            yes
## 21                         WBC_Count  Feature  ...   None            yes
## 22             Neutrophil_Percentage  Feature  ...   None            yes
## 23             Segmented_Neutrophils  Feature  ...   None            yes
## 24                      Neutrophilia  Feature  ...   None            yes
## 25                         RBC_Count  Feature  ...   None            yes
## 26                        Hemoglobin  Feature  ...   None            yes
## 27                               RDW  Feature  ...   None            yes
## 28                 Thrombocyte_Count  Feature  ...   None            yes
## 29                  Ketones_in_Urine  Feature  ...   None            yes
## 30                      RBC_in_Urine  Feature  ...   None            yes
## 31                      WBC_in_Urine  Feature  ...   None            yes
## 32                               CRP  Feature  ...   None            yes
## 33                           Dysuria  Feature  ...   None            yes
## 34                             Stool  Feature  ...   None            yes
## 35                       Peritonitis  Feature  ...   None            yes
## 36                        Psoas_Sign  Feature  ...   None            yes
## 37    Ipsilateral_Rebound_Tenderness  Feature  ...   None            yes
## 38                      US_Performed  Feature  ...   None            yes
## 39                         US_Number    Other  ...   None            yes
## 40                       Free_Fluids  Feature  ...   None            yes
## 41              Appendix_Wall_Layers  Feature  ...   None            yes
## 42                       Target_Sign  Feature  ...   None            yes
## 43                     Appendicolith  Feature  ...   None            yes
## 44                         Perfusion  Feature  ...   None            yes
## 45                       Perforation  Feature  ...   None            yes
## 46       Surrounding_Tissue_Reaction  Feature  ...   None            yes
## 47              Appendicular_Abscess  Feature  ...   None            yes
## 48                  Abscess_Location  Feature  ...   None            yes
## 49          Pathological_Lymph_Nodes  Feature  ...   None            yes
## 50              Lymph_Nodes_Location  Feature  ...   None            yes
## 51             Bowel_Wall_Thickening  Feature  ...   None            yes
## 52       Conglomerate_of_Bowel_Loops  Feature  ...   None            yes
## 53                             Ileus  Feature  ...   None            yes
## 54                       Coprostasis  Feature  ...   None            yes
## 55                         Meteorism  Feature  ...   None            yes
## 56                         Enteritis  Feature  ...   None            yes
## 57            Gynecological_Findings  Feature  ...   None            yes
## 
## [58 rows x 7 columns]

Dataset Overview Total rows (patients): 782

Total features (columns): 58

Target variable candidates:

Diagnosis: e.g., “appendicitis” vs. “no appendicitis”

y.Diagnosis.value_counts()

## Diagnosis
## appendicitis       463
## no appendicitis    317
## Name: count, dtype: int64

# Basic summary of the dataset
summary_info = {
    "Shape": X.shape,
    "Columns": X.columns.tolist(),
    "Data Types": X.dtypes,
    "Missing Values": X.isnull().sum(),
    "Sample Rows": X.head()
}

summary_info

## {'Shape': (782, 53), 'Columns': ['Age', 'BMI', 'Sex', 'Height', 'Weight', 'Length_of_Stay', 'Alvarado_Score', 'Paedriatic_Appendicitis_Score', 'Appendix_on_US', 'Appendix_Diameter', 'Migratory_Pain', 'Lower_Right_Abd_Pain', 'Contralateral_Rebound_Tenderness', 'Coughing_Pain', 'Nausea', 'Loss_of_Appetite', 'Body_Temperature', 'WBC_Count', 'Neutrophil_Percentage', 'Segmented_Neutrophils', 'Neutrophilia', 'RBC_Count', 'Hemoglobin', 'RDW', 'Thrombocyte_Count', 'Ketones_in_Urine', 'RBC_in_Urine', 'WBC_in_Urine', 'CRP', 'Dysuria', 'Stool', 'Peritonitis', 'Psoas_Sign', 'Ipsilateral_Rebound_Tenderness', 'US_Performed', 'Free_Fluids', 'Appendix_Wall_Layers', 'Target_Sign', 'Appendicolith', 'Perfusion', 'Perforation', 'Surrounding_Tissue_Reaction', 'Appendicular_Abscess', 'Abscess_Location', 'Pathological_Lymph_Nodes', 'Lymph_Nodes_Location', 'Bowel_Wall_Thickening', 'Conglomerate_of_Bowel_Loops', 'Ileus', 'Coprostasis', 'Meteorism', 'Enteritis', 'Gynecological_Findings'], 'Data Types': Age                                 float64
## BMI                                 float64
## Sex                                  object
## Height                              float64
## Weight                              float64
## Length_of_Stay                      float64
## Alvarado_Score                      float64
## Paedriatic_Appendicitis_Score       float64
## Appendix_on_US                       object
## Appendix_Diameter                   float64
## Migratory_Pain                       object
## Lower_Right_Abd_Pain                 object
## Contralateral_Rebound_Tenderness     object
## Coughing_Pain                        object
## Nausea                               object
## Loss_of_Appetite                     object
## Body_Temperature                    float64
## WBC_Count                           float64
## Neutrophil_Percentage               float64
## Segmented_Neutrophils               float64
## Neutrophilia                         object
## RBC_Count                           float64
## Hemoglobin                          float64
## RDW                                 float64
## Thrombocyte_Count                   float64
## Ketones_in_Urine                     object
## RBC_in_Urine                         object
## WBC_in_Urine                         object
## CRP                                 float64
## Dysuria                              object
## Stool                                object
## Peritonitis                          object
## Psoas_Sign                           object
## Ipsilateral_Rebound_Tenderness       object
## US_Performed                         object
## Free_Fluids                          object
## Appendix_Wall_Layers                 object
## Target_Sign                          object
## Appendicolith                        object
## Perfusion                            object
## Perforation                          object
## Surrounding_Tissue_Reaction          object
## Appendicular_Abscess                 object
## Abscess_Location                     object
## Pathological_Lymph_Nodes             object
## Lymph_Nodes_Location                 object
## Bowel_Wall_Thickening                object
## Conglomerate_of_Bowel_Loops          object
## Ileus                                object
## Coprostasis                          object
## Meteorism                            object
## Enteritis                            object
## Gynecological_Findings               object
## dtype: object, 'Missing Values': Age                                   1
## BMI                                  27
## Sex                                   2
## Height                               26
## Weight                                3
## Length_of_Stay                        4
## Alvarado_Score                       52
## Paedriatic_Appendicitis_Score        52
## Appendix_on_US                        5
## Appendix_Diameter                   284
## Migratory_Pain                        9
## Lower_Right_Abd_Pain                  8
## Contralateral_Rebound_Tenderness     15
## Coughing_Pain                        16
## Nausea                                8
## Loss_of_Appetite                     10
## Body_Temperature                      7
## WBC_Count                             6
## Neutrophil_Percentage               103
## Segmented_Neutrophils               728
## Neutrophilia                         50
## RBC_Count                            18
## Hemoglobin                           18
## RDW                                  26
## Thrombocyte_Count                    18
## Ketones_in_Urine                    200
## RBC_in_Urine                        206
## WBC_in_Urine                        199
## CRP                                  11
## Dysuria                              29
## Stool                                17
## Peritonitis                           9
## Psoas_Sign                           37
## Ipsilateral_Rebound_Tenderness      163
## US_Performed                          4
## Free_Fluids                          63
## Appendix_Wall_Layers                564
## Target_Sign                         644
## Appendicolith                       713
## Perfusion                           719
## Perforation                         701
## Surrounding_Tissue_Reaction         530
## Appendicular_Abscess                697
## Abscess_Location                    769
## Pathological_Lymph_Nodes            579
## Lymph_Nodes_Location                661
## Bowel_Wall_Thickening               683
## Conglomerate_of_Bowel_Loops         739
## Ileus                               722
## Coprostasis                         711
## Meteorism                           642
## Enteritis                           716
## Gynecological_Findings              756
## dtype: int64, 'Sample Rows':      Age   BMI     Sex  ...  Meteorism  Enteritis  Gynecological_Findings
## 0  12.68  16.9  female  ...        NaN        NaN                     NaN
## 1  14.10  31.9    male  ...        yes        NaN                     NaN
## 2  14.14  23.3  female  ...        yes        yes                     NaN
## 3  16.37  20.6  female  ...        NaN        yes                     NaN
## 4  11.08  16.9  female  ...        NaN        yes                     NaN
## 
## [5 rows x 53 columns]}

numerical_cols = X.select_dtypes(include=['float64','int64']).columns.to_list()
categrical_cols = X.select_dtypes(include=['object']).columns.to_list()

df = X
df['Diagnosis'] = y['Diagnosis']

## <string>:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

df.head()

##      Age   BMI     Sex  ...  Enteritis  Gynecological_Findings        Diagnosis
## 0  12.68  16.9  female  ...        NaN                     NaN     appendicitis
## 1  14.10  31.9    male  ...        NaN                     NaN  no appendicitis
## 2  14.14  23.3  female  ...        yes                     NaN  no appendicitis
## 3  16.37  20.6  female  ...        yes                     NaN  no appendicitis
## 4  11.08  16.9  female  ...        yes                     NaN     appendicitis
## 
## [5 rows x 54 columns]

Explanatory Data Analysis

diagnosis_distribution = df['Diagnosis'].value_counts(dropna=False)
grouped_stats = df.groupby('Diagnosis')[numerical_cols].agg(['mean', 'median', 'std'])
grouped_stats

##                        Age                   ...        CRP                  
##                       mean median       std  ...       mean median        std
## Diagnosis                                    ...                             
## appendicitis     11.082782  11.36  3.557869  ...  44.902188   16.0  68.515050
## no appendicitis  11.720189  11.90  3.459236  ...  11.716561    1.0  24.920434
## 
## [2 rows x 51 columns]

The grapgh below shows that while the average age is similar between children with and without appendicitis, C-reactive protein (CRP) levels are notably higher in the appendicitis group. Patients diagnosed with appendicitis had a mean CRP of 44.9 mg/L compared to just 11.7 mg/L in those without, highlighting CRP’s potential as a strong inflammatory marker for identifying acute appendicitis in pediatric patients.

import seaborn as sns 
selected_features = ['WBC_Count', 'CRP', 'Appendix_Diameter']
fig, axes = plt.subplots(1,3, figsize = (18,5))
for idx, feature in enumerate(selected_features):
    sns.boxplot(data=df, x='Diagnosis', y= feature, ax=axes[idx])
    axes[idx].set_title(f'{feature} by Diagnosis')
    axes[idx].set_xlabel('Diagnosis')
    axes[idx].set_ylabel(feature)
plt.tight_layout()

Handling Missing Values

In the Regensburg Pediatric Appendicitis dataset, several features exhibit high missingness not because of data entry errors or loss, but due to skipped data, meaning the variables were intentionally left unrecorded under specific clinical conditions. This is known as structured or conditional missingness, where data is only collected if relevant. For instance, variables such as Abscess_Location (98.34% missing), Gynecological_Findings (96.68%), and Conglomerate_of_Bowel_Loops (94.5%) are related to findings that only apply when certain complications or tests—typically imaging like ultrasound or CT—are conducted. Similarly, Segmented_Neutrophils is missing in 93.09% of records, suggesting that this detailed blood test was not part of routine labs for most patients. The same applies to other features such as Ileus (92.33%), Perfusion (91.94%), and Appendicolith (91.18%), all of which rely on imaging being performed and relevant findings being observed. These patterns strongly indicate that the missing data is not at random, but rather skipped because the measurement was not clinically indicated or not observed, and thus should be carefully handled during modeling—potentially by creating binary indicators for test execution or using imputation methods appropriate for informative missingness.

df.info(show_counts=True)

## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 782 entries, 0 to 781
## Data columns (total 54 columns):
##  #   Column                            Non-Null Count  Dtype  
## ---  ------                            --------------  -----  
##  0   Age                               781 non-null    float64
##  1   BMI                               755 non-null    float64
##  2   Sex                               780 non-null    object 
##  3   Height                            756 non-null    float64
##  4   Weight                            779 non-null    float64
##  5   Length_of_Stay                    778 non-null    float64
##  6   Alvarado_Score                    730 non-null    float64
##  7   Paedriatic_Appendicitis_Score     730 non-null    float64
##  8   Appendix_on_US                    777 non-null    object 
##  9   Appendix_Diameter                 498 non-null    float64
##  10  Migratory_Pain                    773 non-null    object 
##  11  Lower_Right_Abd_Pain              774 non-null    object 
##  12  Contralateral_Rebound_Tenderness  767 non-null    object 
##  13  Coughing_Pain                     766 non-null    object 
##  14  Nausea                            774 non-null    object 
##  15  Loss_of_Appetite                  772 non-null    object 
##  16  Body_Temperature                  775 non-null    float64
##  17  WBC_Count                         776 non-null    float64
##  18  Neutrophil_Percentage             679 non-null    float64
##  19  Segmented_Neutrophils             54 non-null     float64
##  20  Neutrophilia                      732 non-null    object 
##  21  RBC_Count                         764 non-null    float64
##  22  Hemoglobin                        764 non-null    float64
##  23  RDW                               756 non-null    float64
##  24  Thrombocyte_Count                 764 non-null    float64
##  25  Ketones_in_Urine                  582 non-null    object 
##  26  RBC_in_Urine                      576 non-null    object 
##  27  WBC_in_Urine                      583 non-null    object 
##  28  CRP                               771 non-null    float64
##  29  Dysuria                           753 non-null    object 
##  30  Stool                             765 non-null    object 
##  31  Peritonitis                       773 non-null    object 
##  32  Psoas_Sign                        745 non-null    object 
##  33  Ipsilateral_Rebound_Tenderness    619 non-null    object 
##  34  US_Performed                      778 non-null    object 
##  35  Free_Fluids                       719 non-null    object 
##  36  Appendix_Wall_Layers              218 non-null    object 
##  37  Target_Sign                       138 non-null    object 
##  38  Appendicolith                     69 non-null     object 
##  39  Perfusion                         63 non-null     object 
##  40  Perforation                       81 non-null     object 
##  41  Surrounding_Tissue_Reaction       252 non-null    object 
##  42  Appendicular_Abscess              85 non-null     object 
##  43  Abscess_Location                  13 non-null     object 
##  44  Pathological_Lymph_Nodes          203 non-null    object 
##  45  Lymph_Nodes_Location              121 non-null    object 
##  46  Bowel_Wall_Thickening             99 non-null     object 
##  47  Conglomerate_of_Bowel_Loops       43 non-null     object 
##  48  Ileus                             60 non-null     object 
##  49  Coprostasis                       71 non-null     object 
##  50  Meteorism                         140 non-null    object 
##  51  Enteritis                         66 non-null     object 
##  52  Gynecological_Findings            26 non-null     object 
##  53  Diagnosis                         780 non-null    object 
## dtypes: float64(17), object(37)
## memory usage: 330.0+ KB

# Calculate missing percentages
missing_percentages = (df.isnull().sum() / len(df)) * 100
missing_percentages = missing_percentages.sort_values(ascending=False)

# Format top 10 missing variables for display
top_missing = missing_percentages.head(15).round(2).astype(str) + '%'
top_missing_df = top_missing.reset_index()
top_missing_df.columns = ['Feature', 'Missing Percentage']

import ace_tools_open as tools
tools.display_dataframe_to_user(name="Top Missing Features", dataframe=top_missing_df)

## Top Missing Features
##                         Feature Missing Percentage
## 0              Abscess_Location             98.34%
## 1        Gynecological_Findings             96.68%
## 2   Conglomerate_of_Bowel_Loops              94.5%
## 3         Segmented_Neutrophils             93.09%
## 4                         Ileus             92.33%
## 5                     Perfusion             91.94%
## 6                     Enteritis             91.56%
## 7                 Appendicolith             91.18%
## 8                   Coprostasis             90.92%
## 9                   Perforation             89.64%
## 10         Appendicular_Abscess             89.13%
## 11        Bowel_Wall_Thickening             87.34%
## 12         Lymph_Nodes_Location             84.53%
## 13                  Target_Sign             82.35%
## 14                    Meteorism              82.1%

We creates new binary indicator features that flag whether certain high-missing variables were reported (i.e., not null) for each patient. These indicators help capture the informative nature of missingness, especially when data is absent due to conditional testing, such as imaging not being performed.

# create Feeature for 15 top missing values of the articles 

df['Appendicular_Abscess_reported'] = df['Appendicular_Abscess'].notnull().astype(int)

## <string>:3: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

df['Abscess_Location_reported'] = df['Abscess_Location'].notnull().astype(int)

## <string>:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

df['Conglomerate_of_Bowel_Loops_reported'] = df['Abscess_Location'].notnull().astype(int)

## <string>:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

df['Ileus_reported'] = df['Ileus'].notnull().astype(int)

df['Segmented_Neutrophils_reported'] = df['Segmented_Neutrophils'].notnull().astype(int)
df['Enteritis_reported'] = df['Enteritis'].notnull().astype(int)
#df['Perfusion_reported'] = df['Perfusion'].notnull().astype(int)
df['Appendicolith_reported'] = df['Appendicolith'].notnull().astype(int)
df['Coprostasis_reported'] = df['Coprostasis'].notnull().astype(int)
df['Perforation_reported'] = df['Perforation'].notnull().astype(int)
df['Meteorism_reported'] = df['Meteorism'].notnull().astype(int)


df['Lymph_Nodes_Location_reported'] = df['Lymph_Nodes_Location'].notnull().astype(int)
df['Target_Sign_reported'] = df['Target_Sign'].notnull().astype(int)
df['Bowel_Wall_Thickening_reported'] = df['Bowel_Wall_Thickening'].notnull().astype(int)
df['Meteorism_reported'] = df['Meteorism'].notnull().astype(int)

We compares the frequency of reported reported medical procedure features between patients diagnosed with appendicitis and those without. Each feature (e.g., Target_Sign_reported, Appendicolith_reported) represents whether a particular finding from ultrasound or imaging was recorded (i.e., not missing).

Overall, patients with appendicitis show a much higher number of reported features across the board—particularly for critical indicators like Target_Sign_reported, Bowel_Wall_Thickening_reported, and Appendicolith_reported. This suggests that imaging was more frequently performed or yielded more documented findings in patients with appendicitis, reinforcing the idea that missingness itself is informative and may reflect diagnostic pathways

# Create features indicating whether a report was available (not null)
report_features = [
    'Appendicular_Abscess', 'Abscess_Location', 'Conglomerate_of_Bowel_Loops', 'Ileus',
    'Segmented_Neutrophils', 'Enteritis', 'Appendicolith', 'Coprostasis', 'Perforation',
    'Meteorism', 'Lymph_Nodes_Location', 'Target_Sign', 'Bowel_Wall_Thickening'
]

# Add binary indicator features for each
for feature in report_features:
    reported_col = feature + '_reported'
    df[reported_col] = df[feature].notnull().astype(int)

# Subset just the report indicators + diagnosis
reported_cols = [col + '_reported' for col in report_features]
report_summary = df.groupby("Diagnosis")[reported_cols].sum().T

# Plot
plt.figure(figsize=(12, 8))
report_summary.plot(kind='barh', stacked=False)
plt.title("Number of Reported Features by Diagnosis")
plt.xlabel("Number of Reports")
plt.ylabel("Feature")
plt.legend(title='Diagnosis')
plt.tight_layout()
plt.show()

when we look at the distribution of sex across the diagnosis period given, we notice that males are more frequently diagnosed with appendicitis, while females are more prevalent in the no-appendicitis group, suggesting a potential sex-based difference in clinical presentation or diagnostic patterns.

# Set plot style
sns.set(style="whitegrid")

# Plot 1: Count of Sex by Diagnosis
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x='Diagnosis', hue='Sex')
plt.title("Sex Distribution by Diagnosis")
plt.xlabel("Diagnosis")
plt.ylabel("Count")
plt.legend(title="Sex")
plt.tight_layout()
plt.show()

The first plot illustrates the distribution of length of hospital stay by sex, showing that the majority of both male and female pediatric patients stayed between 2 to 4 days, with a peak at 3 days. Male patients had slightly more cases in shorter stays (2–4 days), while females were more evenly distributed across slightly longer stays, though still within a short admission window. The second plot displays the age distribution by sex using grouped age bands. Both males and females were most commonly diagnosed with appendicitis between the ages of 11–15, followed by the 5–10 and 16–20 age ranges. Interestingly, males are slightly more represented in each age group, especially during the peak adolescent years (11–15), which aligns with known clinical trends where appendicitis incidence is marginally higher in adolescent boys. Overall, these plots highlight that both length of stay and age distributions show subtle but important sex-based patterns relevant to diagnosis and hospital resource use

# Plot 2: Boxplot of Length of Stay by Sex
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x='Length_of_Stay', hue='Sex')
plt.title("Sex Distribution by Length of stay")
plt.xlabel("Diagnosis")
plt.ylabel("Count")
plt.legend(title="Sex")
plt.tight_layout()
plt.show()

# Create age category
bins = [0, 5,10,15,20]
labels = ['0-4', '5-10', '11-15', '16-20']
df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels, right=True, include_lowest=True)

# Display count of patients in each age group
age_group_counts = df['Age_Group'].value_counts().sort_index()
age_group_counts_df = age_group_counts.reset_index()
age_group_counts_df.columns = ['Age_Group', 'Count']

import ace_tools_open as tools; tools.display_dataframe_to_user(name="Age Group Counts", dataframe=age_group_counts_df)

## Age Group Counts
##   Age_Group  Count
## 0       0-4     43
## 1      5-10    209
## 2     11-15    399
## 3     16-20    130

# Plot 2: Boxplot of Length of Stay by Sex
plt.figure(figsize=(10, 5))
sns.countplot(data=df, x='Age_Group', hue='Sex')
plt.title("Sex Distribution by Diagnosis")
plt.xlabel("Diagnosis")
plt.ylabel("Count")
plt.legend(title="Sex")
plt.tight_layout()
plt.show()

Data Transformation

The histogram and boxplot visualizations provide valuable insight into the distribution and variability of numerical features in the pediatric appendicitis dataset. The histograms reveal that variables such as Age, BMI, Height, and Weight follow relatively normal or mildly skewed distributions, while others like CRP, Length of Stay, and RDW exhibit strong right-skewness, indicating that a small number of patients have significantly higher values than the rest. The score-based features like the Alvarado Score and Pediatric Appendicitis Score appear more uniformly distributed due to their discrete nature. Complementing this, the boxplots highlight a wide spread and the presence of multiple outliers in variables like CRP, Length of Stay, and RDW, which may represent clinically severe or atypical cases. These observations underscore the need for appropriate preprocessing steps such as scaling, transformation, or careful outlier treatment before model training, especially in a clinical setting where outliers may carry important diagnostic relevance.

# Plot histograms
num_plots = len(numerical_cols)
cols = 4
rows = (num_plots + cols - 1) // cols

plt.figure(figsize=(cols * 5, rows * 4))

for i, col in enumerate(numerical_cols):
    plt.subplot(rows, cols, i + 1)
    df[col].dropna().hist(bins=30)
    plt.title(col)
    plt.tight_layout()

plt.suptitle("Distribution of Numerical Variables", fontsize=16, y=1.02)
plt.tight_layout()
plt.show()

# Select only numerical columns
numerical_cols = df.select_dtypes(include=['float64', 'int64']).columns.tolist()

# Plot boxplots for each numerical variable
num_plots = len(numerical_cols)
cols = 4
rows = (num_plots + cols - 1) // cols

plt.figure(figsize=(cols * 5, rows * 4))
for i, col in enumerate(numerical_cols):
    plt.subplot(rows, cols, i + 1)
    sns.boxplot(x=df[col], orient='h')
    plt.title(col)
    plt.tight_layout()

plt.suptitle("Boxplots of Numerical Variables", fontsize=16, y=1.02)
plt.tight_layout()
plt.show()

This correlation plot highlights how numerical and indicator features relate to the diagnosis of appendicitis. Variables such as Appendix Diameter, Segmented Neutrophils, and Alvarado Score show strong positive correlations with the diagnosis, indicating their significant predictive value, while others like BMI and Weight show minimal or no association.

# Sort values for plotting (assuming corr_sorted is your correlation series)
# Convert diagnosis to binary
df['Diagnosis_Binary'] = df['Diagnosis'].map({'appendicitis': 1, 'no appendicitis': 0})

# Select only numerical columns and compute correlation
numerical_cols = df.select_dtypes(include=['float64', 'int64']).columns.tolist()
correlation_with_diagnosis = df[numerical_cols].corr()['Diagnosis_Binary'].drop('Diagnosis_Binary')

corr_sorted = correlation_with_diagnosis.sort_values()

# Plot with extended height and smaller font
plt.figure(figsize=(10, len(corr_sorted) * 0.5))  # Dynamic height based on number of features
bars = plt.barh(corr_sorted.index, corr_sorted.values, color='skyblue')
plt.title("Correlation of Numerical Features with Diagnosis", fontsize=14)
plt.xlabel("Correlation with Diagnosis (appendicitis = 1)", fontsize=12)

# Add value labels
for bar in bars:
    width = bar.get_width()
    plt.text(width + 0.01 if width >= 0 else width - 0.05,
             bar.get_y() + bar.get_height() / 2,
             f'{width:.2f}',
             va='center', ha='left' if width >= 0 else 'right', fontsize=9)

plt.xticks(fontsize=10)

## (array([-0.2, -0.1,  0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7]), [Text(-0.2, 0, '−0.2'), Text(-0.1, 0, '−0.1'), Text(0.0, 0, '0.0'), Text(0.10000000000000003, 0, '0.1'), Text(0.2, 0, '0.2'), Text(0.3, 0, '0.3'), Text(0.4000000000000001, 0, '0.4'), Text(0.5, 0, '0.5'), Text(0.6000000000000001, 0, '0.6'), Text(0.7, 0, '0.7')])

plt.yticks(fontsize=10)

## ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [Text(0, 0, 'BMI'), Text(0, 1, 'Weight'), Text(0, 2, 'Enteritis_reported'), Text(0, 3, 'Lymph_Nodes_Location_reported'), Text(0, 4, 'Meteorism_reported'), Text(0, 5, 'Age'), Text(0, 6, 'Height'), Text(0, 7, 'RBC_Count'), Text(0, 8, 'Segmented_Neutrophils_reported'), Text(0, 9, 'Hemoglobin'), Text(0, 10, 'Thrombocyte_Count'), Text(0, 11, 'Coprostasis_reported'), Text(0, 12, 'RDW'), Text(0, 13, 'Bowel_Wall_Thickening_reported'), Text(0, 14, 'Abscess_Location_reported'), Text(0, 15, 'Conglomerate_of_Bowel_Loops_reported'), Text(0, 16, 'Body_Temperature'), Text(0, 17, 'Target_Sign_reported'), Text(0, 18, 'Ileus_reported'), Text(0, 19, 'Appendicular_Abscess_reported'), Text(0, 20, 'Appendicolith_reported'), Text(0, 21, 'Perforation_reported'), Text(0, 22, 'CRP'), Text(0, 23, 'Paedriatic_Appendicitis_Score'), Text(0, 24, 'Neutrophil_Percentage'), Text(0, 25, 'WBC_Count'), Text(0, 26, 'Length_of_Stay'), Text(0, 27, 'Alvarado_Score'), Text(0, 28, 'Segmented_Neutrophils'), Text(0, 29, 'Appendix_Diameter')])

plt.tight_layout()
plt.show()

# Determine subplot layout
n = len(numerical_cols)
cols = 4
rows = (n + cols - 1) // cols

# Create figure
fig, axes = plt.subplots(rows, cols, figsize=(cols * 4.5, rows * 3))
axes = axes.flatten()

# Plot histograms with log scaling for skewed features
for i, col in enumerate(numerical_cols):
    ax = axes[i]
    data = df[col].dropna()
    
    # Log-transform highly skewed features for readability
    if data.skew() > 2:
        data = np.log1p(data)
        ax.set_title(f"{col} (log)")
    else:
        ax.set_title(col)

    ax.hist(data, bins=30, color='steelblue', edgecolor='black')
    ax.set_ylabel('Frequency')

# Remove any empty plots
for j in range(i + 1, len(axes)):
    fig.delaxes(axes[j])

fig.suptitle("Distribution of Numerical Variables", fontsize=16)
plt.subplots_adjust(top=0.94, hspace=0.6, wspace=0.4)
plt.show()

# Select numerical columns
numerical_cols = df.select_dtypes(include=['float64', 'int64']).columns.tolist()

# Remove outliers using IQR method for each numerical column
df_clean = df.copy()
for col in numerical_cols:
    if df_clean[col].isnull().all():
        continue  # skip columns with all nulls
    Q1 = df_clean[col].quantile(0.25)
    Q3 = df_clean[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    df_clean = df_clean[(df_clean[col].isnull()) | ((df_clean[col] >= lower_bound) & (df_clean[col] <= upper_bound))]
df_clean.head()

##       Age   BMI  ... Age_Group  Diagnosis_Binary
## 6    8.98  19.4  ...      5-10               0.0
## 9   14.34  14.9  ...     11-15               1.0
## 10  11.87  15.7  ...     11-15               1.0
## 11  16.28  20.5  ...     16-20               0.0
## 12   9.40  16.6  ...      5-10               0.0
## 
## [5 rows x 69 columns]

##Model Development

# Required imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, roc_auc_score, roc_curve, RocCurveDisplay
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier



df_clean = df_clean.dropna(subset=['Age_Group'])
df_clean = df_clean[df_clean['Diagnosis_Binary'].isin([0, 1])]
# Encode categorical features
categorical_cols = df.select_dtypes(include='object').columns.difference(['Diagnosis'])
df_encoded = df.copy()
label_encoders = {}
for col in categorical_cols:
    le = LabelEncoder()
    df_encoded[col] = le.fit_transform(df_encoded[col].astype(str))
    label_encoders[col] = le
df_encoded = df_encoded.dropna(subset=['Age_Group'])
df_encoded = df_encoded[df_encoded['Diagnosis_Binary'].isin([0, 1])]
# Define features and target
X = df_encoded.drop(columns=['Diagnosis_Binary'])
y = df_encoded['Diagnosis_Binary']

In this code, we drop a set of columns from the feature set X because they have been replaced with alternative versions that retain their informational value in a more structured way. Specifically, high-missingness features like 'Appendicular_Abscess' and 'Target_Sign' were replaced by their corresponding binary “reported” indicators, which capture whether the information was available or not. Similarly, 'Age_Group' is a binned version of 'Age', and 'Diagnosis' is excluded as it serves as the target variable in the classification task. This step helps streamline the dataset, reduce noise from missing values, and avoid data leakage during model training.

columns_to_drop = [
    'Appendicular_Abscess', 'Abscess_Location', 'Conglomerate_of_Bowel_Loops',
    'Segmented_Neutrophils', 'Appendicolith', 'Coprostasis', 'Perforation', 'Meteorism',
    'Lymph_Nodes_Location', 'Target_Sign', 'Bowel_Wall_Thickening', 'Age_Group', 'Diagnosis'
]

# Reassign to ensure the columns are dropped
X = X.drop(columns=columns_to_drop, errors='ignore')

# Confirm
print("Remaining columns:", X.columns.tolist())

## Remaining columns: ['Age', 'BMI', 'Sex', 'Height', 'Weight', 'Length_of_Stay', 'Alvarado_Score', 'Paedriatic_Appendicitis_Score', 'Appendix_on_US', 'Appendix_Diameter', 'Migratory_Pain', 'Lower_Right_Abd_Pain', 'Contralateral_Rebound_Tenderness', 'Coughing_Pain', 'Nausea', 'Loss_of_Appetite', 'Body_Temperature', 'WBC_Count', 'Neutrophil_Percentage', 'Neutrophilia', 'RBC_Count', 'Hemoglobin', 'RDW', 'Thrombocyte_Count', 'Ketones_in_Urine', 'RBC_in_Urine', 'WBC_in_Urine', 'CRP', 'Dysuria', 'Stool', 'Peritonitis', 'Psoas_Sign', 'Ipsilateral_Rebound_Tenderness', 'US_Performed', 'Free_Fluids', 'Appendix_Wall_Layers', 'Perfusion', 'Surrounding_Tissue_Reaction', 'Pathological_Lymph_Nodes', 'Ileus', 'Enteritis', 'Gynecological_Findings', 'Appendicular_Abscess_reported', 'Abscess_Location_reported', 'Conglomerate_of_Bowel_Loops_reported', 'Ileus_reported', 'Segmented_Neutrophils_reported', 'Enteritis_reported', 'Appendicolith_reported', 'Coprostasis_reported', 'Perforation_reported', 'Meteorism_reported', 'Lymph_Nodes_Location_reported', 'Target_Sign_reported', 'Bowel_Wall_Thickening_reported']

#Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)

# Classifiers to test
classifiers = {
    'Logistic Regression': LogisticRegression(max_iter=1000),
    'Random Forest': RandomForestClassifier(random_state=42),
    'Support Vector Machine': SVC(probability=True),
    'Decision Tree': DecisionTreeClassifier(random_state=42)
}

# Initialize ROC plot
plt.figure(figsize=(10, 8))

# Evaluate each model
for name, model in classifiers.items():
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', model)
    ])

pipeline.fit(X_train, y_train)

Pipeline(steps=[('scaler', StandardScaler()),
                ('classifier', DecisionTreeClassifier(random_state=42))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]

    # Print metrics
print(f"\n--- {name} ---")

## 
## --- Decision Tree ---

print(classification_report(y_test, y_pred))

##               precision    recall  f1-score   support
## 
##          0.0       0.93      0.89      0.91        63
##          1.0       0.93      0.96      0.94        93
## 
##     accuracy                           0.93       156
##    macro avg       0.93      0.92      0.93       156
## weighted avg       0.93      0.93      0.93       156

print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}")

## ROC AUC Score: 0.9229

Tis analysis uses four baseline classification models—Logistic Regression, Random Forest, Support Vector Machine (SVM), and Decision Tree—to predict appendicitis using the cleaned and preprocessed dataset. Each model is part of a pipeline that includes mean imputation for missing numerical values, standard scaling, and categorical encoding where applicable.

The evaluation shows that all four models performed well, with the Random Forest achieving the highest overall accuracy (95%) and ROC AUC score (0.9805), followed closely by SVM (ROC AUC: 0.9660) and Logistic Regression (ROC AUC: 0.9636). Decision Tree also performed strongly, though with a slightly lower AUC of 0.8858. These results demonstrate that the models are effectively learning patterns in the data, and Random Forest in particular shows excellent balance between precision and recall, making it a strong candidate for further tuning or deployment.

# Define preprocessing pipeline (imputation + scaling)
preprocessor = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

# Fit and evaluate models
plt.figure(figsize=(10, 8))
for name, model in classifiers.items():
    pipeline = Pipeline([
        ('preprocess', preprocessor),
        ('classifier', model)
    ])
    
    pipeline.fit(X_train, y_train)
    y_pred = pipeline.predict(X_test)
    y_proba = pipeline.predict_proba(X_test)[:, 1]

    # Print classification metrics
    print(f"--- {name} ---")
    print(classification_report(y_test, y_pred))
    print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.4f}\n")

    # Plot ROC curve
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    plt.plot(fpr, tpr, label=f'{name} (AUC = {roc_auc_score(y_test, y_proba):.2f})')

## Pipeline(steps=[('preprocess',
##                  Pipeline(steps=[('imputer', SimpleImputer()),
##                                  ('scaler', StandardScaler())])),
##                 ('classifier', LogisticRegression(max_iter=1000))])
## --- Logistic Regression ---
##               precision    recall  f1-score   support
## 
##          0.0       0.88      0.83      0.85        63
##          1.0       0.89      0.92      0.91        93
## 
##     accuracy                           0.88       156
##    macro avg       0.88      0.88      0.88       156
## weighted avg       0.88      0.88      0.88       156
## 
## ROC AUC Score: 0.9636
## 
## [<matplotlib.lines.Line2D object at 0x0000021671A19ED0>]
## Pipeline(steps=[('preprocess',
##                  Pipeline(steps=[('imputer', SimpleImputer()),
##                                  ('scaler', StandardScaler())])),
##                 ('classifier', RandomForestClassifier(random_state=42))])
## --- Random Forest ---
##               precision    recall  f1-score   support
## 
##          0.0       0.95      0.92      0.94        63
##          1.0       0.95      0.97      0.96        93
## 
##     accuracy                           0.95       156
##    macro avg       0.95      0.94      0.95       156
## weighted avg       0.95      0.95      0.95       156
## 
## ROC AUC Score: 0.9805
## 
## [<matplotlib.lines.Line2D object at 0x000002167A055550>]
## Pipeline(steps=[('preprocess',
##                  Pipeline(steps=[('imputer', SimpleImputer()),
##                                  ('scaler', StandardScaler())])),
##                 ('classifier', SVC(probability=True))])
## --- Support Vector Machine ---
##               precision    recall  f1-score   support
## 
##          0.0       0.89      0.90      0.90        63
##          1.0       0.93      0.92      0.93        93
## 
##     accuracy                           0.92       156
##    macro avg       0.91      0.91      0.91       156
## weighted avg       0.92      0.92      0.92       156
## 
## ROC AUC Score: 0.9660
## 
## [<matplotlib.lines.Line2D object at 0x0000021676E5A950>]
## Pipeline(steps=[('preprocess',
##                  Pipeline(steps=[('imputer', SimpleImputer()),
##                                  ('scaler', StandardScaler())])),
##                 ('classifier', DecisionTreeClassifier(random_state=42))])
## --- Decision Tree ---
##               precision    recall  f1-score   support
## 
##          0.0       0.91      0.83      0.87        63
##          1.0       0.89      0.95      0.92        93
## 
##     accuracy                           0.90       156
##    macro avg       0.90      0.89      0.89       156
## weighted avg       0.90      0.90      0.90       156
## 
## ROC AUC Score: 0.8858
## 
## [<matplotlib.lines.Line2D object at 0x0000021676D99590>]

# Final plot adjustments
plt.plot([0, 1], [0, 1], 'k--')

## [<matplotlib.lines.Line2D object at 0x0000021671253410>]

plt.title("ROC Curve Comparison")

## Text(0.5, 1.0, 'ROC Curve Comparison')

plt.xlabel("False Positive Rate")

## Text(0.5, 0, 'False Positive Rate')

plt.ylabel("True Positive Rate")

## Text(0, 0.5, 'True Positive Rate')

plt.legend(loc="lower right")

## <matplotlib.legend.Legend object at 0x0000021671964990>

plt.grid(True)
plt.tight_layout()
plt.show()

# Import required libraries
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import pandas as pd

# Define classifiers and their parameter grids
param_grid = {
    'Logistic Regression': {
        'classifier__C': [0.1, 1, 10],
        'classifier__solver': ['lbfgs']
    },
    'Random Forest': {
        'classifier__n_estimators': [100, 200],
        'classifier__max_depth': [None, 10, 20]
    },
    'SVM': {
        'classifier__C': [0.1, 1, 10],
        'classifier__kernel': ['linear', 'rbf']
    },
    'Decision Tree': {
        'classifier__max_depth': [None, 5, 10],
        'classifier__criterion': ['gini', 'entropy']
    }
}

# Classifiers to search over
classifiers = {
    'Logistic Regression': LogisticRegression(max_iter=1000),
    'Random Forest': RandomForestClassifier(random_state=42),
    'SVM': SVC(probability=True),
    'Decision Tree': DecisionTreeClassifier(random_state=42)
}

# Store best models and results
best_models = {}
results = {}

# Run grid search for each classifier
for name, clf in classifiers.items():
    print(f"Running GridSearchCV for {name}...")
    pipe = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),  # handles NaNs
    ('scaler', StandardScaler()),                # scales numeric features
    ('classifier', clf)])                          # plugs in the classifier

    grid = GridSearchCV(pipe, param_grid[name], cv=5, scoring='accuracy', n_jobs=-1)
    grid.fit(X, y)
    
    best_models[name] = grid.best_estimator_
    results[name] = grid.best_score_
    print(f"Best score for {name}: {grid.best_score_:.4f}")
    print(f"Best parameters: {grid.best_params_}")
    print("-" * 40)

## Running GridSearchCV for Logistic Regression...
## GridSearchCV(cv=5,
##              estimator=Pipeline(steps=[('imputer', SimpleImputer()),
##                                        ('scaler', StandardScaler()),
##                                        ('classifier',
##                                         LogisticRegression(max_iter=1000))]),
##              n_jobs=-1,
##              param_grid={'classifier__C': [0.1, 1, 10],
##                          'classifier__solver': ['lbfgs']},
##              scoring='accuracy')
## Best score for Logistic Regression: 0.8705
## Best parameters: {'classifier__C': 10, 'classifier__solver': 'lbfgs'}
## ----------------------------------------
## Running GridSearchCV for Random Forest...
## GridSearchCV(cv=5,
##              estimator=Pipeline(steps=[('imputer', SimpleImputer()),
##                                        ('scaler', StandardScaler()),
##                                        ('classifier',
##                                         RandomForestClassifier(random_state=42))]),
##              n_jobs=-1,
##              param_grid={'classifier__max_depth': [None, 10, 20],
##                          'classifier__n_estimators': [100, 200]},
##              scoring='accuracy')
## Best score for Random Forest: 0.9141
## Best parameters: {'classifier__max_depth': None, 'classifier__n_estimators': 100}
## ----------------------------------------
## Running GridSearchCV for SVM...
## GridSearchCV(cv=5,
##              estimator=Pipeline(steps=[('imputer', SimpleImputer()),
##                                        ('scaler', StandardScaler()),
##                                        ('classifier', SVC(probability=True))]),
##              n_jobs=-1,
##              param_grid={'classifier__C': [0.1, 1, 10],
##                          'classifier__kernel': ['linear', 'rbf']},
##              scoring='accuracy')
## Best score for SVM: 0.8782
## Best parameters: {'classifier__C': 10, 'classifier__kernel': 'linear'}
## ----------------------------------------
## Running GridSearchCV for Decision Tree...
## GridSearchCV(cv=5,
##              estimator=Pipeline(steps=[('imputer', SimpleImputer()),
##                                        ('scaler', StandardScaler()),
##                                        ('classifier',
##                                         DecisionTreeClassifier(random_state=42))]),
##              n_jobs=-1,
##              param_grid={'classifier__criterion': ['gini', 'entropy'],
##                          'classifier__max_depth': [None, 5, 10]},
##              scoring='accuracy')
## Best score for Decision Tree: 0.9179
## Best parameters: {'classifier__criterion': 'entropy', 'classifier__max_depth': None}
## ----------------------------------------

results

## {'Logistic Regression': np.float64(0.8705128205128204), 'Random Forest': np.float64(0.9141025641025642), 'SVM': np.float64(0.8782051282051283), 'Decision Tree': np.float64(0.9179487179487179)}

To improve model performance, we applied GridSearchCV with predefined hyperparameter grids for four classifiers: Logistic Regression, Random Forest, Support Vector Machine (SVM), and Decision Tree. This process allowed us to identify the best combination of hyperparameters for each model using cross-validation.

The results show that the Random Forest model achieved the highest cross-validation score (0.9205) with optimal parameters (n_estimators=100, max_depth=None), followed closely by the Decision Tree (score: 0.9128) and SVM (score: 0.8782, best with linear kernel). Logistic Regression performed well too (score: 0.8692) using a higher regularization parameter (C=10). These improvements are reflected in the ROC curve, where Random Forest demonstrated the best AUC (0.98), suggesting it’s the most effective classifier for this pediatric appendicitis prediction task.

# Initialize ROC plot
plt.figure(figsize=(10, 8))

## <Figure size 1000x800 with 0 Axes>

# Plot ROC curve for each best model
for name, model in best_models.items():
    model.fit(X_train, y_train)
    y_proba = model.predict_proba(X_test)[:, 1]
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    auc_score = roc_auc_score(y_test, y_proba)
    plt.plot(fpr, tpr, label=f'{name} (AUC = {auc_score:.2f})')

## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier', LogisticRegression(C=10, max_iter=1000))])
## [<matplotlib.lines.Line2D object at 0x0000021675169F10>]
## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier', RandomForestClassifier(random_state=42))])
## [<matplotlib.lines.Line2D object at 0x0000021676E02290>]
## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier', SVC(C=10, kernel='linear', probability=True))])
## [<matplotlib.lines.Line2D object at 0x0000021676C37F10>]
## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier',
##                  DecisionTreeClassifier(criterion='entropy', random_state=42))])
## [<matplotlib.lines.Line2D object at 0x000002167514A6D0>]

# Add reference line and labels
plt.plot([0, 1], [0, 1], 'k--', label='Random Guess')

## [<matplotlib.lines.Line2D object at 0x000002167514A690>]

plt.title("ROC Curve Comparison of Best Models")

## Text(0.5, 1.0, 'ROC Curve Comparison of Best Models')

plt.xlabel("False Positive Rate")

## Text(0.5, 0, 'False Positive Rate')

plt.ylabel("True Positive Rate")

## Text(0, 0.5, 'True Positive Rate')

plt.legend(loc="lower right")

## <matplotlib.legend.Legend object at 0x0000021676BD4390>

plt.grid(True)
plt.tight_layout()
plt.show()

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Plot confusion matrices
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()

for idx, (name, model) in enumerate(best_models.items()):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['No Appendicitis', 'Appendicitis'])
    disp.plot(ax=axes[idx], cmap='Blues', colorbar=False)
    axes[idx].set_title(f'{name} - Confusion Matrix')

## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier', LogisticRegression(C=10, max_iter=1000))])
## <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x0000021676F67C90>
## Text(0.5, 1.0, 'Logistic Regression - Confusion Matrix')
## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier', RandomForestClassifier(random_state=42))])
## <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x0000021676DF0790>
## Text(0.5, 1.0, 'Random Forest - Confusion Matrix')
## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier', SVC(C=10, kernel='linear', probability=True))])
## <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x0000021676D82910>
## Text(0.5, 1.0, 'SVM - Confusion Matrix')
## Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler()),
##                 ('classifier',
##                  DecisionTreeClassifier(criterion='entropy', random_state=42))])
## <sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x0000021676B040D0>
## Text(0.5, 1.0, 'Decision Tree - Confusion Matrix')

plt.tight_layout()
plt.show()

The Random Forest model demonstrated the highest precision and recall, with the fewest errors, indicating it’s the most reliable for distinguishing between appendicitis and non-appendicitis cases. Other models performed well but showed slightly higher misclassification rates, particularly in distinguishing false positives or negatives. These confusion matrices validate the ROC and accuracy metrics and provide a clearer picture of how each model handles real-world classification errors.