March 11, 2020

Evaluating a Model

Important to consider carefully what would be a reasonable baseline against which to compare model performance.

Baselines approaches for classification tasks

  • Majority classifier: Naive classifier that always chooses the majority class of the training dataset
  • Decision Stump: Decision tree with only one internal node, the root node

Decision Stump

Tree induction selects the single most informative feature to make a decision.

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets, Robert Holte (1993)

The specific kind of rules examined in this article, called “1-rules,” are rules that classify an object on the basis of a single attribute (i.e., they are 1-level decision trees).

Not all real-world problems are hard problems, and thus a naive classification task could prove valuable.

When measuring the value of an algorithm, consider the actual cost of processing and data collection, as a simple classifier algorithm could be financially more beneficial given all costs.

Weka Algorithms

DecisionStump implements decision stump classification (trees with a single split only), which are frequently used as base learners for meta learners such as Boosting.

  • For non-binary classification, a stump with the two leaves, one of which corresponds to some chosen category, and the other leaf to all the other categories

InfoGainAttributeEval evaluates the worth of an attribute by measuring the information gain with respect to the class.

  • Feature selection task that contributes to decreasing the overall entropy.

Dataset

Population: Individuals with accepted credit card applications

Target variable: Defaulted (Yes or No)

Input variables:

  • Age = Age in years plus twelfths of a year

  • Adepcnt = 1 + number of dependents

  • Acadmos = months living at current address

  • Majordrg = Number of major derogatory reports

  • Minordrg = Number of minor derogatory reports

  • Ownrent = 1 if owns their home, 0 if rent

  • Income = Monthly income (divided by 10,000)

  • Selfempl = 1 if self employed, 0 if not

  • Inc_per = Income divided by number of dependents

  • Exp_Inc = Ratio of monthly credit card expenditure to yearly income

R Code Snippet

Code chunk highlights the libraries RWeka and pROC along with functions DecisionStump and InfoGainAttributeEval

library(RWeka)
library(pROC)

# Weka tree classifier: Decision Stump
stump <- DecisionStump(fml, data = train)
roc2 <- roc(as.factor(test$DEFAULT), 
            predict(stump, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.5947

# Weka attribute selection
imp <- InfoGainAttributeEval(fml, data = train)
imp_x <- test[, names(imp[imp == max(imp)])]
roc1 <- roc(as.factor(test$DEFAULT), imp_x)
# Area under the curve: 0.6338

ggroc(list(Predictive.Attr=roc1, Decision.Stump=roc2), 
      aes="linetype", color="blue", legacy.axes = FALSE)

Classifiers selected Income as the input attribute resulting in similar performance.

Source: https://statcompute.wordpress.com/2016/01/01/the-power-of-decision-stumps/

AUC - ROC Curve

AUC: Area Under Curve – ROC: Receiver Operating Characteristics curve

DecisionStump AUC: 0.5947 – InfoGainAttributeEval AUC: 0.6338

Info Gain Attribute Evaluation algorithm outperforms the Decision Stump algorithm

Sensitivity = True Positive Rate

1 - Specificity = False Positive Rate

AUC is 0.7, it means there is 70% chance that model will be able to distinguish between positive class and negative class.

Conclusion

Data-driven model requires valid comparison in order to evaluate performance.

Decision Stump can provide a reasonable baseline for classification tasks.

A simple classification task may prove feasible compared to more complex classifier algorithms.