DMwR Chapter 7 - Classifying Microarray Samples
Introduction
The fourth case study is from the area of bioinformatics. Namely, we will address the problem of classifying microarray samples into a set of alternative classes. More speci???cally, given a microarray probe that describes the gene expression levels of a patient, we aim to classify this patient into a pre-defined set of genetic mutations of acute lymphoblastic leukemia.
This case study addresses several new data mining topics. The main focus, given the characteristics of this type of dataset, is on feature selection, that is, how to reduce the number of features that describe each observation. In our approach to this particular application we will illustrate several general methods for feature selection. Other new data mining topics addressed in this chapter include k-nearest neighbors classifiers, bootstrap estimates, and some new variants of ensemble models.
This case study will provide information on:
- Feature selection methods for problems with a very large number of predictors
- Classification methods
- Random forests
- k-Nearest neighbors
- SVMs
- Ensembles using di???erent subsets of predictors
- Bootstrap experiments
Reference: Data Mining with R (2nd Edition) by Professor Luis Torgo, pages 353 - 381.
Load Libraries
library(dplyr)
library(DT)
library(DMwR)
library(DMwR2)