1 Sample Groups

1.1 Sample Summary

1.2 Data Preprocessing

CEL files were processed using the oligo package. Robust multichip averaging (rma) was used to background correct, normalize, and summarize probe level data. Annotations were taken from the hugene10sttranscriptcluser database. Control probes were removed before linear modelling.

2 Comparison 1: kSORT

kSORT assay genes (table below) predicting acute rejection (Group 3), chronic antibody mediated rejection (Group 4), and BKV viremia (Group 5) compared to Groups 1 and 2.

2.1 Group 1 as control

2.1.1 G3 vs G1

2.1.1.1 kSORT Subset Gene Expression

2.1.1.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.1.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.2 G4 vs G1

2.1.2.1 kSORT Subset Gene Expression

2.1.2.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.2.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.3 G5 vs G1

2.1.3.1 kSORT Subset Gene Expression

2.1.3.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.3.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.4 G6 vs G1

2.1.4.1 kSORT Subset Gene Expression

2.1.4.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.4.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2 Group 2 as control

2.2.1 G3 vs G2

2.2.1.1 kSORT Subset Gene Expression

2.2.1.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.1.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.2 G4 vs G2

2.2.2.1 kSORT Subset Gene Expression

2.2.2.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.2.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.3 G5 vs G2

2.2.3.1 kSORT Subset Gene Expression

2.2.3.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.3.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.4 G6 vs G2

2.2.4.1 kSORT Subset Gene Expression

2.2.4.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.4.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3 Comparison 2a: 5 Gene Subset

We want look at the following 5 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

3.1 Group 1 as control

3.1.1 G3 vs G1

3.1.1.1 Subset Gene Expression

3.1.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

3.1.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.1.2 G4 vs G1

3.1.2.1 Subset Gene Expression

3.1.2.2 Subset of Genes as predictors

3.1.2.3 Feature selection: finding an optimal subset of Genes

3.1.3 G5 vs G1

3.1.3.1 Subset Gene Expression

3.1.3.2 Subset of Genes as predictors

3.1.3.3 Feature selection: finding an optimal subset of Genes

3.1.4 G6 vs G1

3.1.4.1 Subset Gene Expression

3.1.4.2 Subset of Genes as predictors

3.1.4.3 Feature selection: finding an optimal subset of Genes

3.2 Group 2 as control

3.2.1 G3 vs G2

3.2.1.1 Subset Gene Expression

3.2.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

3.2.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.2.2 G4 vs G2

3.2.2.1 Subset Gene Expression

3.2.2.2 Subset of Genes as predictors

3.2.2.3 Feature selection: finding an optimal subset of Genes

3.2.3 G5 vs G2

3.2.3.1 Subset Gene Expression

3.2.3.2 Subset of Genes as predictors

3.2.3.3 Feature selection: finding an optimal subset of Genes

3.2.4 G6 vs G2

3.2.4.1 Subset Gene Expression

3.2.4.2 Subset of Genes as predictors

3.2.4.3 Feature selection: finding an optimal subset of Genes

4 Comparison 2b: 3 Gene Subset

We want look at the following 3 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

4.1 G1 as control

4.1.1 G3 vs G1

4.1.1.1 Subset Gene Expression

4.1.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.1.2 G4 vs G1

4.1.2.1 Subset Gene Expression

4.1.2.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.2.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.1.3 G5 vs G1

4.1.3.1 Subset Gene Expression

4.1.3.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.3.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.1.4 G6 vs G1

4.1.4.1 Subset Gene Expression

4.1.4.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.4.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2 G2 as control

We want look at the following 3 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

  • KLF6
  • BNC2
  • CYP1B1

4.2.1 G3 vs G2

4.2.1.1 Subset Gene Expression

4.2.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2.2 G4 vs G2

4.2.2.1 Subset Gene Expression

4.2.2.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.2.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2.3 G5 vs G2

4.2.3.1 Subset Gene Expression

4.2.3.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.3.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2.4 G6 vs G2

4.2.4.1 Subset Gene Expression

4.2.4.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.4.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

5 Comparison 3: Entire Data set

5.1 G3 vs G2

5.1.0.1 Feature selection: finding an optimal subset of Genes

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

5.1.0.2 Diferential Expression Analysis

5.2 G4 vs G2

5.2.0.1 Feature selection: finding an optimal subset of Genes

5.2.0.2 Diferential Expression Analysis

5.3 G5 vs G2

5.3.0.1 Feature selection: finding an optimal subset of Genes

5.3.0.2 Differential Expression Analysis

5.4 G6 vs G2

5.4.0.1 Feature selection: finding an optimal subset of Genes

6 Comparison 4: Limma Romer Analysis of PBTs

6.1 G3 vs G2

6.2 G4 vs G2

6.3 G5 vs G2

6.4 G6 vs G2