1 Sample Groups

Group 1: NORMAL, PRETRANSPLANT (n=21)
Group 2: NORMAL, POSTTRANSPLANT (n=21)
Group 3: ACUTE REJECTION (n=7)
Group 4: CHRONIC ANTIBODY MEDIATED REJECTION (n=21)
Group 5: BKV VIREMIA (n=19)
Group 6: IFTA (n= 18 - Sample 9v0(10666) is missing)

1.1 Sample Summary

1.2 Data Preprocessing

CEL files were processed using the oligo package. Robust multichip averaging (rma) was used to background correct, normalize, and summarize probe level data. Annotations were taken from the hugene10sttranscriptcluser database. Control probes were removed before linear modelling.

2 Comparison 1: kSORT

kSORT assay genes (table below) predicting acute rejection (Group 3), chronic antibody mediated rejection (Group 4), and BKV viremia (Group 5) compared to Groups 1 and 2.

2.1 Group 1 as control

2.1.1 G3 vs G1

2.1.1.1 kSORT Subset Gene Expression

2.1.1.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.1.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.2 G4 vs G1

2.1.2.1 kSORT Subset Gene Expression

2.1.2.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.2.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.3 G5 vs G1

2.1.3.1 kSORT Subset Gene Expression

2.1.3.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.3.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.1.4 G6 vs G1

2.1.4.1 kSORT Subset Gene Expression

2.1.4.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.1.4.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2 Group 2 as control

2.2.1 G3 vs G2

2.2.1.1 kSORT Subset Gene Expression

2.2.1.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.1.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.2 G4 vs G2

2.2.2.1 kSORT Subset Gene Expression

2.2.2.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.2.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.3 G5 vs G2

2.2.3.1 kSORT Subset Gene Expression

2.2.3.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.3.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

2.2.4 G6 vs G2

2.2.4.1 kSORT Subset Gene Expression

2.2.4.2 kSORT Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

2.2.4.3 Feature selection: finding an optimal subset of kSORT Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3 Comparison 2a: 5 Gene Subset

We want look at the following 5 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

MARCHF8
DCAF12
FLT3
IL1R2
PDCD1

3.1 Group 1 as control

3.1.1 G3 vs G1

3.1.1.1 Subset Gene Expression

3.1.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

3.1.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.1.2 G4 vs G1

3.1.2.1 Subset Gene Expression

3.1.2.2 Subset of Genes as predictors

3.1.2.3 Feature selection: finding an optimal subset of Genes

3.1.3 G5 vs G1

3.1.3.1 Subset Gene Expression

3.1.3.2 Subset of Genes as predictors

3.1.3.3 Feature selection: finding an optimal subset of Genes

3.1.4 G6 vs G1

3.1.4.1 Subset Gene Expression

3.1.4.2 Subset of Genes as predictors

3.1.4.3 Feature selection: finding an optimal subset of Genes

3.2 Group 2 as control

3.2.1 G3 vs G2

3.2.1.1 Subset Gene Expression

3.2.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

3.2.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

3.2.2 G4 vs G2

3.2.2.1 Subset Gene Expression

3.2.2.2 Subset of Genes as predictors

3.2.2.3 Feature selection: finding an optimal subset of Genes

3.2.3 G5 vs G2

3.2.3.1 Subset Gene Expression

3.2.3.2 Subset of Genes as predictors

3.2.3.3 Feature selection: finding an optimal subset of Genes

3.2.4 G6 vs G2

3.2.4.1 Subset Gene Expression

3.2.4.2 Subset of Genes as predictors

3.2.4.3 Feature selection: finding an optimal subset of Genes

4 Comparison 2b: 3 Gene Subset

We want look at the following 3 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

KLF6
BNC2
CYP1B1

4.1 G1 as control

4.1.1 G3 vs G1

4.1.1.1 Subset Gene Expression

4.1.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.1.2 G4 vs G1

4.1.2.1 Subset Gene Expression

4.1.2.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.2.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.1.3 G5 vs G1

4.1.3.1 Subset Gene Expression

4.1.3.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.3.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.1.4 G6 vs G1

4.1.4.1 Subset Gene Expression

4.1.4.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.1.4.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2 G2 as control

We want look at the following 3 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4), BKV viremia (Group 5) and IFTA (Group 6) using Normal,Post-Transplant data (Group 2) as control. The subset of genes in question is the following:

KLF6
BNC2
CYP1B1

4.2.1 G3 vs G2

4.2.1.1 Subset Gene Expression

4.2.1.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the kSORT genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.1.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2.2 G4 vs G2

4.2.2.1 Subset Gene Expression

4.2.2.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.2.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2.3 G5 vs G2

4.2.3.1 Subset Gene Expression

4.2.3.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.3.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

4.2.4 G6 vs G2

4.2.4.1 Subset Gene Expression

4.2.4.2 Subset of Genes as predictors

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the 5-gene subset. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

4.2.4.3 Feature selection: finding an optimal subset of Genes

A Recursive Feature Elimination (RFE) with 5-fold cross-validation with a Support Vector Machine algorithm was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.

5 Comparison 3: Entire Data set

5.1 G3 vs G2

5.1.0.1 Feature selection: finding an optimal subset of Genes

A Support Vector Machine (SVM) algorithm was used to predict the classes based on the expression values of the subset of genes. A 5-fold cross validation was used when generating the confusion matrices and the density plot correspond to the area under the receiver operating characteristic curve from the 5-fold cross-validation.

BLOOD GENE EXPRESSION PROFILES STUDY

Kidney Transplant Patients

Mariel Barbachan

06 April 2021

1 Sample Groups

1.1 Sample Summary

1.2 Data Preprocessing

2 Comparison 1: kSORT

2.1 Group 1 as control

2.1.1 G3 vs G1

2.1.1.1 kSORT Subset Gene Expression

2.1.1.2 kSORT Genes as predictors

2.1.1.3 Feature selection: finding an optimal subset of kSORT Genes

2.1.2 G4 vs G1

2.1.2.1 kSORT Subset Gene Expression

2.1.2.2 kSORT Genes as predictors

2.1.2.3 Feature selection: finding an optimal subset of kSORT Genes

2.1.3 G5 vs G1

2.1.3.1 kSORT Subset Gene Expression

2.1.3.2 kSORT Genes as predictors

2.1.3.3 Feature selection: finding an optimal subset of kSORT Genes

2.1.4 G6 vs G1

2.1.4.1 kSORT Subset Gene Expression

2.1.4.2 kSORT Genes as predictors

2.1.4.3 Feature selection: finding an optimal subset of kSORT Genes

2.2 Group 2 as control

2.2.1 G3 vs G2

2.2.1.1 kSORT Subset Gene Expression

2.2.1.2 kSORT Genes as predictors

2.2.1.3 Feature selection: finding an optimal subset of kSORT Genes

2.2.2 G4 vs G2

2.2.2.1 kSORT Subset Gene Expression

2.2.2.2 kSORT Genes as predictors

2.2.2.3 Feature selection: finding an optimal subset of kSORT Genes

2.2.3 G5 vs G2

2.2.3.1 kSORT Subset Gene Expression

2.2.3.2 kSORT Genes as predictors

2.2.3.3 Feature selection: finding an optimal subset of kSORT Genes

2.2.4 G6 vs G2

2.2.4.1 kSORT Subset Gene Expression

2.2.4.2 kSORT Genes as predictors

2.2.4.3 Feature selection: finding an optimal subset of kSORT Genes

3 Comparison 2a: 5 Gene Subset

3.1 Group 1 as control

3.1.1 G3 vs G1

3.1.1.1 Subset Gene Expression

3.1.1.2 Subset of Genes as predictors

3.1.1.3 Feature selection: finding an optimal subset of Genes

3.1.2 G4 vs G1

3.1.2.1 Subset Gene Expression

3.1.2.2 Subset of Genes as predictors

3.1.2.3 Feature selection: finding an optimal subset of Genes

3.1.3 G5 vs G1

3.1.3.1 Subset Gene Expression

3.1.3.2 Subset of Genes as predictors

3.1.3.3 Feature selection: finding an optimal subset of Genes

3.1.4 G6 vs G1

3.1.4.1 Subset Gene Expression

3.1.4.2 Subset of Genes as predictors

3.1.4.3 Feature selection: finding an optimal subset of Genes

3.2 Group 2 as control

3.2.1 G3 vs G2

3.2.1.1 Subset Gene Expression

3.2.1.2 Subset of Genes as predictors

3.2.1.3 Feature selection: finding an optimal subset of Genes

3.2.2 G4 vs G2

3.2.2.1 Subset Gene Expression

3.2.2.2 Subset of Genes as predictors

3.2.2.3 Feature selection: finding an optimal subset of Genes

3.2.3 G5 vs G2

3.2.3.1 Subset Gene Expression

3.2.3.2 Subset of Genes as predictors

3.2.3.3 Feature selection: finding an optimal subset of Genes

3.2.4 G6 vs G2

3.2.4.1 Subset Gene Expression

3.2.4.2 Subset of Genes as predictors

3.2.4.3 Feature selection: finding an optimal subset of Genes

4 Comparison 2b: 3 Gene Subset

4.1 G1 as control

4.1.1 G3 vs G1