kSORT assay genes (table below) predicting acute rejection (Group 3), chronic antibody mediated rejection (Group 4), and BKV viremia (Group 5) compared to Groups 1 and 2.
CEL files were processed using the oligo package. Robust multichip averaging (rma) was used to background correct, normalize, and summarize probe level data. Annotations were taken from the hugene10sttranscriptcluser database. Control probes were removed before linear modelling.
##Exploratory Data Analysis plots Exploratory data analysis was carried out to examine sample-to-sample variation. The heatmap is generated from Pearson’s correlation between each sample pair and is based on all expression values. The multidimensinal scaling plot (MDS) is based on the top 500 gene expression differences between samples and is colored according to sample group.
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = g3DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.295e-05 2.110e-08 2.110e-08 1.337e-06 1.392e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.378e+02 7.043e+06 0 1
## NAMPT -6.598e+01 6.186e+05 0 1
## PSEN1 -3.426e+01 1.404e+06 0 1
## ITGAX 4.845e+01 1.244e+06 0 1
## RARA 2.579e+01 1.434e+06 0 1
## EPOR -1.624e+02 2.170e+06 0 1
## CEACAM4 3.051e+01 8.497e+05 0 1
## CFLAR -1.330e+02 6.239e+05 0 1
## NKTR 2.267e+01 2.501e+06 0 1
## RNF13 -2.661e+01 1.113e+06 0 1
## RYBP 5.044e+01 1.663e+06 0 1
## GZMK 2.884e+00 7.892e+05 0 1
## DUSP1 3.356e+01 5.276e+05 0 1
## MAPK9 -1.924e+01 2.019e+06 0 1
## IFNGR1 6.739e+01 9.345e+05 0 1
## RHEB 1.652e+00 1.606e+06 0 1
## SLC25A37 1.004e+02 7.772e+05 0 1
## RXRA -2.952e+01 4.597e+05 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4.0191e+01 on 48 degrees of freedom
## Residual deviance: 1.5303e-09 on 31 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = g4DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.048e-05 -2.100e-08 2.100e-08 2.100e-08 3.465e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.578e+02 3.845e+06 0.000 1.000
## NAMPT 1.509e+02 1.981e+06 0.000 1.000
## PSEN1 9.886e-01 4.649e+05 0.000 1.000
## ITGAX 5.560e+00 3.067e+05 0.000 1.000
## RARA 5.951e+01 3.994e+05 0.000 1.000
## EPOR -1.751e+02 8.799e+05 0.000 1.000
## CEACAM4 1.759e+02 1.427e+05 0.001 0.999
## CFLAR -5.021e+02 4.962e+05 -0.001 0.999
## NKTR -7.505e+01 4.569e+05 0.000 1.000
## RNF13 -2.477e+02 9.394e+05 0.000 1.000
## RYBP 2.469e+02 3.869e+05 0.001 0.999
## GZMK -2.018e+01 7.929e+04 0.000 1.000
## DUSP1 7.184e+01 5.400e+05 0.000 1.000
## MAPK9 9.888e+01 7.523e+05 0.000 1.000
## IFNGR1 4.297e+01 1.136e+06 0.000 1.000
## RHEB 8.816e+01 4.924e+05 0.000 1.000
## SLC25A37 2.261e+02 7.800e+05 0.000 1.000
## RXRA -2.196e+02 6.617e+05 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8.0201e+01 on 62 degrees of freedom
## Residual deviance: 8.5551e-09 on 45 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = g5DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.46928 -0.21842 0.05239 0.56639 1.49113
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 219.4842 126.9207 1.729 0.0838 .
## NAMPT -14.6150 7.0697 -2.067 0.0387 *
## PSEN1 9.3106 6.2122 1.499 0.1339
## ITGAX -4.2172 4.3173 -0.977 0.3287
## RARA -2.8295 3.3621 -0.842 0.4000
## EPOR -2.4074 2.8942 -0.832 0.4055
## CEACAM4 5.9847 3.8414 1.558 0.1193
## CFLAR -2.5103 3.8041 -0.660 0.5093
## NKTR -12.1980 5.7908 -2.106 0.0352 *
## RNF13 -0.3145 5.7372 -0.055 0.9563
## RYBP 7.6547 4.3793 1.748 0.0805 .
## GZMK 1.6102 1.4330 1.124 0.2612
## DUSP1 -2.2398 3.1057 -0.721 0.4708
## MAPK9 0.2652 2.8027 0.095 0.9246
## IFNGR1 5.0440 4.2903 1.176 0.2397
## RHEB -2.6692 3.2608 -0.819 0.4130
## SLC25A37 7.1320 3.4838 2.047 0.0406 *
## RXRA -13.0264 6.4258 -2.027 0.0426 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75.674 on 60 degrees of freedom
## Residual deviance: 37.017 on 43 degrees of freedom
## AIC: 73.017
##
## Number of Fisher Scoring iterations: 8
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = binomial(link = "logit"),
## data = g34DF, weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.713e-05 -2.100e-08 -2.100e-08 2.100e-08 4.292e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.469e+02 6.290e+06 0 1
## NAMPT -2.262e+02 3.201e+06 0 1
## PSEN1 4.179e+01 1.369e+06 0 1
## ITGAX 4.962e+01 9.077e+05 0 1
## RARA 3.547e+01 1.976e+06 0 1
## EPOR 3.562e+01 2.293e+06 0 1
## CEACAM4 -1.600e+02 4.003e+05 0 1
## CFLAR 4.536e+02 1.340e+06 0 1
## NKTR -8.777e+00 1.362e+06 0 1
## RNF13 1.926e+02 2.329e+06 0 1
## RYBP -2.656e+02 8.842e+05 0 1
## GZMK -7.085e-01 3.969e+05 0 1
## DUSP1 -2.974e+01 1.498e+06 0 1
## MAPK9 -4.099e+01 1.259e+06 0 1
## IFNGR1 -2.080e+01 1.866e+06 0 1
## RHEB -4.273e+01 1.406e+06 0 1
## SLC25A37 -1.166e+02 1.809e+06 0 1
## RXRA 1.465e+02 2.118e+06 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9.4222e+01 on 69 degrees of freedom
## Residual deviance: 9.2267e-09 on 52 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = g345DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9548 -0.4397 -0.1100 0.3160 2.8630
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 82.0447 48.9149 1.677 0.0935 .
## NAMPT -4.7563 3.9788 -1.195 0.2319
## PSEN1 2.5695 3.8725 0.664 0.5070
## ITGAX -1.0303 2.0023 -0.515 0.6069
## RARA -4.4329 2.6659 -1.663 0.0964 .
## EPOR -3.7469 2.3787 -1.575 0.1152
## CEACAM4 3.4260 1.4870 2.304 0.0212 *
## CFLAR -5.3213 2.4777 -2.148 0.0317 *
## NKTR -5.2602 2.7803 -1.892 0.0585 .
## RNF13 0.7688 3.6206 0.212 0.8318
## RYBP 5.6846 2.3926 2.376 0.0175 *
## GZMK -0.1151 0.7638 -0.151 0.8803
## DUSP1 0.7883 1.9093 0.413 0.6797
## MAPK9 1.8072 1.9687 0.918 0.3586
## IFNGR1 0.4555 2.4189 0.188 0.8506
## RHEB -0.7275 2.1230 -0.343 0.7318
## SLC25A37 7.0630 2.8795 2.453 0.0142 *
## RXRA -6.5280 3.1250 -2.089 0.0367 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 123.099 on 88 degrees of freedom
## Residual deviance: 52.758 on 71 degrees of freedom
## AIC: 88.758
##
## Number of Fisher Scoring iterations: 6
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = g6DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.133e-05 -2.110e-08 -2.110e-08 1.140e-06 8.907e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.140e+02 1.703e+07 0 1
## NAMPT -2.788e+01 1.517e+06 0 1
## PSEN1 -3.440e+01 9.226e+05 0 1
## ITGAX -3.238e+01 5.802e+05 0 1
## RARA 1.144e+02 7.932e+05 0 1
## EPOR 1.321e+01 5.427e+05 0 1
## CEACAM4 -4.022e+01 7.937e+05 0 1
## CFLAR 2.325e+01 1.166e+06 0 1
## NKTR 2.551e+00 9.888e+05 0 1
## RNF13 -8.273e+01 1.509e+06 0 1
## RYBP -1.674e+01 2.715e+05 0 1
## GZMK -5.563e+00 2.155e+05 0 1
## DUSP1 -8.365e+00 4.860e+05 0 1
## MAPK9 3.151e+01 6.735e+05 0 1
## IFNGR1 1.129e+02 8.731e+05 0 1
## RHEB -1.312e+01 6.630e+05 0 1
## SLC25A37 4.719e+01 7.254e+05 0 1
## RXRA -3.416e+01 8.000e+05 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5.3834e+01 on 38 degrees of freedom
## Residual deviance: 6.5185e-10 on 21 degrees of freedom
## AIC: 36
##
## Number of Fisher Scoring iterations: 25
Is there any subgroup of those 17 genes are better predictor for acute (Group 3) or chronic rejection (Group 4) and BKV viremia (Group 5)?
A Recursive Feature Elimination (RFE) with 3-fold cross-validation was used to select the optimal subset of features that maximise the area under the receiver operating characteristic curve.
The confusion matrix below represent the proportion of correctly assigned classes for the Rejection vs Normal comparison with the kSORT data set.
A caption
The confusion matrix below represent the proportion of correctly assigned classes for the BKV vs Normal comparison with the kSORT dataset.
A caption
We want look at the following 5 gene transcripts if they are predicting acute (Group 3) or chronic rejection (Group 4) and BKV viremia (Group 5). These genes are:
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = Q2_1DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3638 -0.4914 -0.2517 0.4592 2.1738
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 64.27720 27.46152 2.341 0.019251 *
## MARCHF8 8.80456 2.46511 3.572 0.000355 ***
## FLT3 0.12287 0.77174 0.159 0.873507
## IL1R2 -0.09188 0.65049 -0.141 0.887670
## PDCD1 -1.60371 2.81293 -0.570 0.568595
## DCAF12 -12.57539 3.24842 -3.871 0.000108 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 94.222 on 69 degrees of freedom
## Residual deviance: 50.615 on 64 degrees of freedom
## AIC: 62.615
##
## Number of Fisher Scoring iterations: 6
##
## Call:
## glm(formula = as.factor(Class) ~ ., family = "binomial", data = Q2_2DF,
## weights = na.action(na.omit))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0741 -1.1699 0.6841 0.8152 1.5445
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -31.45741 20.60003 -1.527 0.1267
## MARCHF8 -3.81741 2.12124 -1.800 0.0719 .
## FLT3 0.53397 0.71490 0.747 0.4551
## IL1R2 0.05057 0.57909 0.087 0.9304
## PDCD1 0.75821 1.86053 0.408 0.6836
## DCAF12 5.39137 2.74110 1.967 0.0492 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75.674 on 60 degrees of freedom
## Residual deviance: 69.290 on 55 degrees of freedom
## AIC: 81.29
##
## Number of Fisher Scoring iterations: 4
The limma package was used to fit a group-means paramaterization. Moderated F-statistics are calculated using the eBayes function.
## [1] 0.65633
The maximum absolute value for logFC is 0.65633.
| NGenes | Up | Down | Mixed | |
|---|---|---|---|---|
| core KT1 | 517.00 | 0.03 | 0.97 | 0.01 |
| core KT2 | 63.00 | 0.02 | 0.98 | 0.01 |
| core ENDAT | 112.00 | 0.34 | 0.66 | 0.11 |
| core IRITD5 | 196.00 | 0.66 | 0.34 | 0.13 |
| core DSAST | 19.00 | 0.41 | 0.59 | 0.14 |
| core CMAT | 61.00 | 1.00 | 0.00 | 0.26 |
| GST IQR | 62.00 | 0.61 | 0.39 | 0.29 |
| core GRIT1 | 39.00 | 0.94 | 0.06 | 0.33 |
| GSTs | 85.00 | 0.58 | 0.42 | 0.37 |
| core IRITD3 | 302.00 | 0.96 | 0.04 | 0.53 |
| CAT1 IQR | 120.00 | 1.00 | 0.00 | 0.56 |
| AMA | 173.00 | 0.99 | 0.01 | 0.64 |
| AMA IQR | 87.00 | 0.98 | 0.02 | 0.65 |
| CAT1 | 204.00 | 1.00 | 0.00 | 0.81 |
| BATs IQR | 46.00 | 0.61 | 0.39 | 0.97 |