Ensembles of classifiers

This work implements simple ensembles of classifiers. We take only one clasffifier which is Naive Bayes classifier that was implemented in the previous work. It’s known that an ensemble would be more accurate than it’s components provided these components have accuracy greater than 0.5. There are several ways to construct ensembles. For example we can combine different types of classifiers (e.g. kNN and Naive Bayes) or we can take one type of classifier and manipulate the training set or input attributes. In this work we only consider method for manipulating the training set. The idea of is to run the same algorithm multiple times with different subsets/samples of the same dataset. We will use these two approaches:

Load datasets

As usual we start with loading the datasets. We will use these: iris, wine and ionoshpere. In each dataset we will set the last column as the target column (for wine we need to swap first in last to make this hold).

data(iris)
wine <- read.csv("~/Studies/Year3/DataMining/wine.data", header=FALSE)
wine = cbind(wine[,-1],wine[,1])
colnames(wine)[ncol(wine)] = "target"
ion = read.table("~/Studies/Year3/Machine Learning/ionosphere.data",sep=",")

Next we’ll load the code for the Naive Bayes classifier from the pervious work (the code is skipped in the print version).

Construct ensembles

Cross-validation

The idea of cross-validation is to break the original training set into N disjoint datasets and then run algorithm N times each time leaving one of the subsets out and working only with dataset made of N-1 datasets. Obviously, the datasets will overlap. To implement this approach we will create a function that will break the data into N folds and create Naive Bayes classifiers for each N-1 subset. The function works with discretized data so we will have to do some preprocessing before using this function. To break the dataset into 10 subsets we’re using the caret library.

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
cross_val <- function(train, n_folds, target_column_index) {
    trainy = train[,ncol(train)]
    folds <- createFolds(trainy, k = n_folds, list = TRUE, 
                         returnTrain = FALSE)
    tables <- vector("list", n_folds)
    i <- 1
    for (fold in folds) {
        trainfold = train[-fold,]
        pr_tables = get_prob_tables(trainfold,target_column_index)
        tables[[i]] = pr_tables
        i = i + 1
    }
    tables
}

Bagging

The idea of bagging is as simple as cross-validation. Take k samples of size of the train set by sampling with replacement. The result is again k classifieres represented by probabilities tables.

bagging <- function(train, n_samples, target_column_index){
    tables <- vector("list", n_samples)
    N = nrow(train)
    for (i in 1:n_samples) {
        selvec = sample(N,N,replace=TRUE)
        trainbag = train[selvec, ]
        pr_tables = get_prob_tables(trainbag,target_column_index)
        tables[[i]] = pr_tables
        i = i + 1
    }
    tables
}

Data Preprocessing: Train and Test Sets

The following code splits the dataset into test and train, performs discretization on the train set and returns a list of objects:

  • discretized train set
  • original (non-discretized test set)
  • list of thresholds obtained when discretizing the train set
split_and_discretize <- function(dset, train_ratio, discr = TRUE) {
    target_column_index = ncol(dset)
    target_column = dset[,target_column_index]
    split = sample.split(target_column, SplitRatio = train_ratio)
    train = subset(dset, split==T)
    test = subset(dset, split==F)
    if (discr) {
        # discretize
        trainD = chiM(train)
        train = trainD$Disc.data
        th = trainD$cutp
    }
    else {
        th = NULL
    }  
    list(train = train, test = test, th = th)
}

Now we can write a function that builds an ensemble and implements simple voting procedure. In this case the winner is simply the one who gets the most of the votes, so the quality of the “expert” doesn’t matter - all classifiers are equal. The function returns the predictions for the given test set.

simple_static_voting <- function(tables, testX, n_sets = 10, 
                                 discr = TRUE, th = NULL, 
                                 construction_function) {
    predictions = c()
    for (t in tables) {
        prediction = apply(testX,1,classify, t,th,1,discr)
        predictions = cbind(predictions, prediction)
    }
    # combine using simple voting
    preds = apply(predictions, 1, 
                  function(x) {names(sort(table(x),
                                          decreasing=TRUE)[1])})
    preds
}

Finally we can write the function that will run the experiment.

run_static_ensemble_experiment <- function(dset, train_ratio, 
                                           discr = TRUE, 
                                           construction_function) {
    dsets = split_and_discretize(dset, train_ratio, discr = TRUE)
    train = dsets$train
    test = dsets$test
    testX = test[,-ncol(test)]
    testy = test[,ncol(test)]
    th = dsets$th
    target_column_index = ncol(train)
    tables = construction_function(train, 10, target_column_index)
    preds = simple_static_voting(tables, testX, 10, discr, th, 
                                 construction_function)
    sum(preds == testy)/length(testy)
}

Next we will create a dataframe to store the results of the experiments.

acc_table = data.frame(matrix(data = 0, nrow=6,ncol=3))
colnames(acc_table) = c("Iris", "Wine", "Ionoshpere")
rownames(acc_table) = c("Static ensemble: CV",
                        "Static ensemble: Bagging",
                        "Weighted voting: CV",
                        "Weighted voting: Bagging",
                        "Weighted Majority: CV",
                        "Weighted Majority: Bagging")

And now we’ll finally run the experiment.

acc_table[1,1] = run_static_ensemble_experiment(iris, 
                                                0.7, TRUE, cross_val)
acc_table[1,2] = run_static_ensemble_experiment(wine, 
                                                0.7, TRUE, cross_val)
acc_table[1,3] = run_static_ensemble_experiment(ion, 
                                                0.7, TRUE, cross_val)

acc_table[2,1] = run_static_ensemble_experiment(iris, 
                                                0.7, TRUE, bagging)
acc_table[2,2] = run_static_ensemble_experiment(wine, 
                                                0.7, TRUE, bagging)
acc_table[2,3] = run_static_ensemble_experiment(ion,
                                                0.7, TRUE, bagging)

Weighted voting

To perform weighted voting we need to get additional subset for validation to learn our weights so we need to split the original dataset into 3 parts: train, validation and test. We will use the standard proportion 0.6/0.2/0.2. And againg we will discretize the test set and save the thresholds

split_into_three <- function(dset, ratios, discr = TRUE) {
    g = sample(cut(
    seq(nrow(dset)), 
    nrow(dset)*cumsum(c(0,ratios)),
    labels = names(ratios)
    ))
    res = split(dset, g)
    train = res$train
    test = res$test
    validation = res$validate
    if (discr) {
        # discretize
        trainD = chiM(train)
        train = trainD$Disc.data
        th = trainD$cutp
    }
    else {
        th = NULL
    }  
    list(train = train, test = test, validation = validation, th = th)
}

The following function applies weighted voting for the vector.

apply_weights <- function(row, weights) {
    classes = unique(row)
    answer = row[1]
    max_vote = 0
    for (c in classes) {
        inds = which(row==c)
        votes = sum(weights[inds])
        if (votes > max_vote) {
            answer = c
            max_vote = votes
        }
    }
    answer
}

The following function performs weighted voting for the dataset.

weighted_voting <- function(tables, weights, testX, 
                            n_sets = 10, discr = TRUE, th = NULL) {
    predictions = c()
    for (t in tables) {
        prediction = apply(testX,1,classify, t,th,1,discr)
        predictions = cbind(predictions, prediction)
    }
    # combine using simple voting
    preds = apply(predictions, 1, function(x) { apply_weights(x, weights)})
    preds
}

Since we don’t have experts we need to assign weights somehow. One of the methods is to evaluate the accuracy on the validation set and use it as weight. The following function implements this approach.

get_weights <- function(tables, validation, th) {
    tci = ncol(validation)
    valX = validation[,-tci]
    valy = validation[,tci]
    weights = c()
    for (t in tables) {
        predictions = apply(valX,1,classify, t,th,1,TRUE)
        w = sum(predictions == valy)/length(valy)
        weights = c(weights, w)
    }
    weights
}

Now we can write the function to run the experiment.

run_weighted_voting_experiment <- function(dset, ratios, 
                                           discr = TRUE, 
                                           construction_function, 
                                           get_weights_function) {
    dsets = split_into_three(dset, ratios, discr = TRUE) 
    train = dsets$train
    validation = dsets$validation
    test = dsets$test
    testX = test[,-ncol(test)]
    testy = test[,ncol(test)]
    th = dsets$th
    target_column_index = ncol(train)
    #  get classifiers
    tables = construction_function(train, 10, target_column_index)
    # get weights
    weights = get_weights_function(tables, validation, th)
    preds = weighted_voting(tables, weights, testX, 10, discr = TRUE, th)
    sum(preds == testy)/length(testy)
}

And save the results:

spec = c(train = .6, test = .2, validate = .2)   
acc_table[3,1] = run_weighted_voting_experiment(iris, 
                                                spec, TRUE, 
                                                cross_val, get_weights)
acc_table[3,2] = run_weighted_voting_experiment(wine, 
                                                spec, TRUE, 
                                                cross_val, get_weights)
acc_table[3,3] = run_weighted_voting_experiment(ion, 
                                                spec, TRUE, 
                                                cross_val, get_weights)
acc_table[4,1] = run_weighted_voting_experiment(iris, 
                                                spec, TRUE, 
                                                bagging, get_weights)
acc_table[4,2] = run_weighted_voting_experiment(wine, 
                                                spec, TRUE, 
                                                bagging, get_weights)
acc_table[4,3] = run_weighted_voting_experiment(ion, 
                                                spec, TRUE, 
                                                bagging, get_weights)

Weighted majority approach

This algorithm is similar to the previous one except we’re learning the weights differently. After we’ve trained our classifiers we take the validation set and apply each of the classifieres to it. Then we check if the classifier gave the correct answer. If it failed we penalize this classifier by multiplying it’s weight by 0.5. Initially all classifiers have weights equal to 1.

get_weights_online <- function(tables, validation, th) {
    tci = ncol(validation)
    valX = validation[,-tci]
    valy = validation[,tci]
    N = length(tables)
    weights = rep(1,N)
    for (q in 1:nrow(validation)) {
        query = valX[q,]
        for (i in 1:N) {
            ans = apply(query, 1, classify, tables[[i]], th)
            if (valy[q] != ans) {
                weights[i] = weights[i]*0.5
            }
        }
    }
    weights
}

Since the rest of the code is the same as in the previous method we can immediately write the function to run the experiment.

spec = c(train = .6, test = .2, validate = .2) 
acc_table[5,1] = run_weighted_voting_experiment(iris, spec, 
                                                TRUE, cross_val, 
                                                get_weights_online)
acc_table[5,2] = run_weighted_voting_experiment(wine, spec, 
                                                TRUE, cross_val, 
                                                get_weights_online)
acc_table[5,3] = run_weighted_voting_experiment(ion, spec, 
                                                TRUE, cross_val, 
                                                get_weights_online)
acc_table[6,1] = run_weighted_voting_experiment(iris, spec, 
                                                TRUE, bagging, 
                                                get_weights_online)
acc_table[6,2] = run_weighted_voting_experiment(wine, spec, 
                                                TRUE, bagging, 
                                                get_weights_online)
acc_table[6,3] = run_weighted_voting_experiment(ion, spec, 
                                                TRUE, bagging, 
                                                get_weights_online)

Summary

library(knitr)
kable(acc_table,digits=4,caption="Accuracy Table")
Accuracy Table
Iris Wine Ionoshpere
Static ensemble: CV 0.9778 0.9623 0.8762
Static ensemble: Bagging 0.8889 0.9057 0.8952
Weighted voting: CV 0.9333 0.9444 0.9143
Weighted voting: Bagging 1.0000 0.9722 0.9000
Weighted Majority: CV 0.9667 1.0000 0.8714
Weighted Majority: Bagging 0.9000 1.0000 0.8857

In the particular case the accuracy is hardly better than the accuracy of the single Bayes classifier on the same datasets. But there is an obvious reason for that. In the previous work we splitted data only into 2 subsets and here we’re losing significant part of information by learning only on 0.6 of the data in the last two experiments while using just 0.2 for test and 0.2 for validation. Moreover, there is one assumption we didn’t pay attention to at the beginning:
An ensemble is more accurate than it’s component classifiers IF the individual classifiers disagree with one another
In this case this is rarely true. After running 10 classifiers and printing out their predictions we realized that actually they are giving the same results, so this must be the reason why there was no significant improvement over the basic Bayes classifier.