The Gini Coefficient under Binary Classification Problems

Harel Lustiger
October, 2017

Introduction

Real World Example

New Zealand's Jacinda Ardern sets out priorities: climate, inequality and women.

What should we measure to determine if social inequality was tackled successfully by the end of the PM term?

→ Suggestion: The Gini Coefficient

Graphical representation of Gini: Lorenz curve, Gaerem Hart and Me

Graphical representation of Gini: Lorenz curve, Gaerem Hart and Me

Projecting the 2D space into a scalar metric (a.k.a. Gini Coefficient)

GiniCoeff <- function(solution, submission){

    df = data.frame(solution = solution, submission = submission)
    df = df[order(df$submission, decreasing = TRUE),]
    df$uniform = (1:nrow(df))/nrow(df) # = (1/n, 2/n, ..., 1)
    totalPos = sum(df$solution) # how many time '1' appears in reality?

    # This will store the cumulative number of positive 
    # examples found (used for computing "Model Lorentz")
    df$cumPosFound = cumsum(df$solution) 

    # This will store the cumulative proportion of positive examples
    # found ("Model Lorentz")
    df$Lorentz = df$cumPosFound / totalPos 

    # This will store Lorentz minus uniform
    df$Gini = df$Lorentz - df$uniform   

    return(sum(df$Gini))
}

To learn more about “Estimation of the Gini coefficient” see this link (p. 14)

Normalized Gini Coefficient Attributes

NormGiniCoeff <- function(solution, submission){

    GiniCoeff(solution, submission) / GiniCoeff(solution, solution)

}

Context

Featured Prediction Competition:
Porto Seguro's Safe Driver Prediction.

Problem:
Predict if a driver will file an insurance claim next year.

Objective:
Maximize the (Normalized) Gini Coefficient.

Prediction Type
Classification with emphasize on scoring classifiers.

Scoring Classifiers

What is Scoring?

Given 6 members from the insurance company database and their true future claim result, we observe two algorithms outputs:

Member Name Ground Truth 1st Algorithm Score 2nd Algorithm Score
A No 1 3
B No 2 2
C No 3 1
D Yes 4 6
E Yes 5 4
F Yes 6 5

Are the two algorithms' scores the same? YES, both would yield the same Gini coeff value

Are the two algorithms' ranks the same? NO, but who cares?!

Scoring: Takeaways

  • The algorithms need not rank all the instances in a desired order. What is important is that the positive instances are generally ranked higher than the negative ones.
  • The scores need not be in any predefined intervals, nor be likelihoods or probabilities over class memberships.
  • The overall hope is that the classifier typically scores the positive examples higher than the negative examples.

→ The door for regression algorithms and their objective functions is open.

Key Attributes of the Target Variable

  • The target variable has two states, '0' and '1'.
  • The target variable is skewed, while we are eager to find the rare class.
0 1
Frequency 573518 21694
Proportion 96 4

Why is it important?

Supervised learning rests on the assumption that the initial training set is going to produce a useful model for discriminating between classes.

Furthermore, in their basic form, most classifiers do not behave well on unbalanced data sets. Instead, most classifiers have predictive preference for the class with the greater proportion of examples.

Gini and Other Performance Measures in Practice

Experimental Setup

  • Data Splitting - The train set is split into 4 unequal parts:
  • (10%) training set - used for fitting xgboost models.
  • (10%) evaluation set - given a performance measure (such as AUC), the model performance is evaluated based on this set. If there is no improvement for \( 10 \) rounds, the training stops.
  • (10%) test set - used as unseen data for assessing the different models.
  • (70%) validation set - not in use at all.

Experimental Setup

  • Data Preprocessing
  • Remove all rows containing missing values.
  • Remove all cols which are not continuous variables (inc. order factor).
  • → train, eval and test sets all have 12 variables and around 40,000 rows
  • Model Fitting
  • For each performance measure, there are 200 bootstrap xgboost models.
  • Before each run, the train set is sampled with replacement (i.e. bootstrap sample).
  • The other sets (eval and test) remain unchanged.

Experimental Setup

  • Parameter Tuning - The parameters were chosen in accordance with Anton Aksyonov suggestions which can be found here.
model <- xgb.train(data=dX_bs,
                   # Parameter for Tree Booster
                   max_depth=6,
                   eta=0.02,
                   gamma=1,
                   subsample=0.95,
                   colsample_bytree=0.8,
                   min_child_weight=20,
                   # Early Stopping to Avoid Overfitting
                   nrounds=1000,
                   early_stopping_rounds=10,
                   watchlist=list(train=dX_bs, test=dX_ev),
                   # Task Parameters
                   objective="binary:logistic",
                   eval_metric=eval_metric, 
                   # eval_metric \in {"error","auc","map","rmse"}
                   seed=2145)

1st Experiment

Correlation Plots between Gini and selected performance measure

plot of chunk correlation_plots

\[ \text{AUC}=(\text{Gini}+1)/2 \quad\leftrightarrow\quad \text{Gini}=2\times \text{AUC}-1 \]

2nd Experiment

Boxplots of selected performance measure which were used to
stop xgboost training

plot of chunk boxplots