CEL files were processed using the oligo package. Robust multichip averaging (rma) was used to background correct, normalize, and summarize probe level data. Annotations were taken from the hugene10sttranscriptcluser database. Control probes were removed before linear modelling.
A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.