1 Sample Groups

1.1 Sample Summary

1.2 Data Preprocessing

CEL files were processed using the oligo package. Robust multichip averaging (rma) was used to background correct, normalize, and summarize probe level data. Annotations were taken from the hugene10sttranscriptcluser database. Control probes were removed before linear modelling.

2 Comparison 1: kSORT

kSORT assay genes (table below) predicting acute rejection (Group 3), chronic antibody mediated rejection (Group 4), and BKV viremia (Group 5) compared to Groups 1 and 2.

2.1 Group 1 as control

2.1.1 G3 vs G1

2.1.1.1 kSORT Subset Gene Expression

2.1.1.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.1.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.2 G4 vs G1

2.1.2.1 kSORT Subset Gene Expression

2.1.2.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.2.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.3 G5 vs G1

2.1.3.1 kSORT Subset Gene Expression

2.1.3.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.3.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.4 G6 vs G1

2.1.4.1 kSORT Subset Gene Expression

2.1.4.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.4.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2 Group 2 as control

2.2.1 G3 vs G2

2.2.1.1 kSORT Subset Gene Expression

2.2.1.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.1.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.2 G4 vs G2

2.2.2.1 kSORT Subset Gene Expression

2.2.2.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.2.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.3 G5 vs G2

2.2.3.1 kSORT Subset Gene Expression

2.2.3.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.3.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.4 G6 vs G2

2.2.4.1 kSORT Subset Gene Expression

2.2.4.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.4.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3 Comparison 2a: 3 Gene Subset

We want look at the following 3 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

3.1 G1 as control

3.1.1 G3 vs G1

3.1.1.1 Subset Gene Expression

3.1.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.1.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.1.2 G4 vs G1

3.1.2.1 Subset Gene Expression

3.1.2.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.1.2.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.1.3 G5 vs G1

3.1.3.1 Subset Gene Expression

3.1.3.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.1.3.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.1.4 G6 vs G1

3.1.4.1 Subset Gene Expression

3.1.4.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.1.4.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.2 G2 as control

We want look at the following 3 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

  • KLF6
  • BNC2
  • CYP1B1

3.2.1 G3 vs G2

3.2.1.1 Subset Gene Expression

3.2.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.2.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.2.2 G4 vs G2

3.2.2.1 Subset Gene Expression

3.2.2.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.2.2.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.2.3 G5 vs G2

3.2.3.1 Subset Gene Expression

3.2.3.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.2.3.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.2.4 G6 vs G2

3.2.4.1 Subset Gene Expression

3.2.4.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 3-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 3-fold cross-validation.

3.2.4.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 3-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.