DMwR Chapter 7 - Classifying Microarray Samples

Introduction
Load Libraries

Introduction

The fourth case study is from the area of bioinformatics. Namely, we will address the problem of classifying microarray samples into a set of alternative classes. More speci???cally, given a microarray probe that describes the gene expression levels of a patient, we aim to classify this patient into a pre-defined set of genetic mutations of acute lymphoblastic leukemia.

This case study addresses several new data mining topics. The main focus, given the characteristics of this type of dataset, is on feature selection, that is, how to reduce the number of features that describe each observation. In our approach to this particular application we will illustrate several general methods for feature selection. Other new data mining topics addressed in this chapter include k-nearest neighbors classifiers, bootstrap estimates, and some new variants of ensemble models.

This case study will provide information on:

Feature selection methods for problems with a very large number of predictors
Classification methods
Random forests
k-Nearest neighbors
SVMs
Ensembles using di???erent subsets of predictors
Bootstrap experiments

Reference: Data Mining with R (2nd Edition) by Professor Luis Torgo, pages 353 - 381.

Load Libraries

library(dplyr)
library(DT)
library(DMwR)
library(DMwR2)