Objective

Perform real world data analysis using BifurcatoR tools to check bimodality

dxsmall data: pediatric leukemia gene expression values with different fusion groups

## class: SingleCellExperiment 
## dim: 6 470 
## metadata(0):
## assays(2): counts logcounts
## rownames(6): MECOM PRDM16 ... NCAM1 KDM5D
## rowData names(4): gene_id medianLogCounts rangeLogCounts madLogCounts
## colnames(470): PAEAKL PAECCE ... PAYFYN PAYIET
## colData names(9): AgeGroup Sex ... OSI Protocol

Cohort Characteristics

First 10 columns of counts

##        PAEAKL PAECCE PAKLPD PAKTCX PANGDN PANPSV PANTCT PANTRL PANTWV PANVUF
## MECOM       2      5      6   8639   7992   1525     47      0   2361   2277
## PRDM16     16      3      3     18     25      2     17      2     13     15
## CD33     3969   3304   1825   2808   3913   3146   2256   1146   1030    839
## CD34       42  27571     39   2072   5201     15    202      8   5896    431
## NCAM1   29383    260   1062     16     90      2   4670     95      0      0
## KDM5D       1   5694      3     15   6730   5916   3583      1      4      8

Summary Statistics of dxsmall counts

##      MECOM           PRDM16             CD33              CD34        
##  Min.   :    0   Min.   :   0.00   Min.   :   18.0   Min.   :    2.0  
##  1st Qu.:    1   1st Qu.:   4.00   1st Qu.:  704.5   1st Qu.:   29.0  
##  Median :    6   Median :  10.00   Median : 1431.0   Median :  179.5  
##  Mean   : 1666   Mean   : 781.92   Mean   : 1839.8   Mean   : 2796.5  
##  3rd Qu.:   50   3rd Qu.:  73.25   3rd Qu.: 2547.5   3rd Qu.: 3714.0  
##  Max.   :30144   Max.   :8134.00   Max.   :11016.0   Max.   :35040.0  
##      NCAM1             KDM5D      
##  Min.   :    0.0   Min.   :    0  
##  1st Qu.:   15.0   1st Qu.:    3  
##  Median :   54.0   Median : 1492  
##  Mean   :  888.6   Mean   : 3501  
##  3rd Qu.:  456.0   3rd Qu.: 5829  
##  Max.   :29383.0   Max.   :61745

First 10 columns of logcounts

##            PAEAKL    PAECCE    PAKLPD    PAKTCX    PANGDN    PANPSV    PANTCT
## MECOM   1.1437896  2.159773  2.202325 12.635797 12.166000  9.918715  4.920319
## PRDM16  3.4164146  1.623430  1.485994  3.833752  3.942245  1.181463  3.534913
## CD33   11.2296987 11.162989 10.098914 11.014947 11.136046 10.962656 10.457859
## CD34    4.7225611 14.223301  4.609656 10.576675 11.546401  3.393759  6.986912
## NCAM1  14.1173132  7.502706  9.318746  3.676419  5.720816  1.181463 11.506984
## KDM5D   0.6823965 11.947954  1.485994  3.590813 11.918109 11.873421 11.124880
##           PANTRL    PANTWV    PANVUF
## MECOM   0.000000 10.507923 10.643734
## PRDM16  1.942100  3.171871  3.527659
## CD33   10.670511  9.312449  9.204888
## CD34    3.628841 11.827667  8.246217
## NCAM1   7.087735  0.000000  0.000000
## KDM5D   1.275795  1.792971  2.726274

Summary Statistics of dxsmall logcounts

##      MECOM            PRDM16            CD33             CD34       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 4.253   Min.   : 1.484  
##  1st Qu.: 1.238   1st Qu.: 2.222   1st Qu.: 9.485   1st Qu.: 4.856  
##  Median : 2.653   Median : 3.357   Median :10.445   Median : 7.565  
##  Mean   : 4.417   Mean   : 4.787   Mean   :10.144   Mean   : 8.085  
##  3rd Qu.: 5.521   3rd Qu.: 6.077   3rd Qu.:11.090   3rd Qu.:11.559  
##  Max.   :14.512   Max.   :12.697   Max.   :12.681   Max.   :14.593  
##      NCAM1            KDM5D       
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 3.751   1st Qu.: 1.873  
##  Median : 5.782   Median :11.124  
##  Mean   : 6.329   Mean   : 7.450  
##  3rd Qu.: 8.808   3rd Qu.:12.276  
##  Max.   :14.117   Max.   :15.161

rowData: Rows contains their own metadata as a dataframe format to describe their features.

## DataFrame with 6 rows and 4 columns
##                   gene_id medianLogCounts rangeLogCounts madLogCounts
##               <character>       <numeric>      <numeric>    <numeric>
## MECOM  ENSG00000085276.19         1.83653        14.5121      2.44570
## PRDM16 ENSG00000142611.17         3.32100        12.6973      2.85342
## CD33   ENSG00000105383.15        10.05794         9.9846      1.41178
## CD34   ENSG00000174059.17        10.86344        16.0191      3.86343
## NCAM1  ENSG00000149294.17         4.93129        15.8618      3.14829
## KDM5D  ENSG00000012817.16         7.56726        15.1607      7.56651

colData: Columns contains their own metadata as a dataframe format to describe their features.

## DataFrame with 470 rows and 9 columns
##        AgeGroup      Sex      FAB BlastPercent       fusion FusionGroup
##        <factor> <factor> <factor>    <numeric>     <factor>    <factor>
## PAEAKL   Child    Female       M4           NA KMT2A-MLLT1          MLL
## PAECCE   Infant   Male         M4           NA KMT2A-MLLT10         MLL
## PAKLPD   Infant   Female       M4           NA KMT2A-ELL            MLL
## PAKTCX   Child    Female       M4           NA KMT2A-ELL            MLL
## PANGDN   AYA      Male         M4           NA KMT2A-MLLT4          MLL
## ...         ...      ...      ...          ...          ...         ...
## PAYBSY    AYA     Female       M5           NA   NUP98-NSD1        NSD1
## PAYDUC    AYA     Female       M5           NA   NUP98-NSD1        NSD1
## PAYETG    Child   Female       NA           NA   NUP98-NSD1        NSD1
## PAYFYN    Child   Male         NA           NA   NUP98-NSD1        NSD1
## PAYIET    Child   Male         M1           NA   NUP98-NSD1        NSD1
##               OS       OSI    Protocol
##        <numeric> <numeric> <character>
## PAEAKL       163         1    CCG-2961
## PAECCE       147         1    CCG-2961
## PAKLPD      2707         0    CCG-2961
## PAKTCX      1896         1    CCG-2961
## PANGDN      3063         0    AAML03P1
## ...          ...       ...         ...
## PAYBSY       763         1    AAML1031
## PAYDUC       420         1    AAML1031
## PAYETG        NA         0    AAML1031
## PAYFYN        NA         0    AAML1031
## PAYIET       577         1    AAML1031

Age Group

Gender

Fusion Group

Density Plot of MECOM logcounts in different fusion group

Mclust fit: fitting the dataset using a mixture model based on the hypothesis that two clusters exist by specifying group number equals 2.

mclust fit object density plot in MLL

## [1] "Mean of the two clusters in MLL are: 2.55  12.2"
## [1] "SD of the two clusters in MLL are: 1.73  1.28"
## [1] "Mixing proportion of the two clusters in MLL are: 0.71  0.29"
## [1] "The p-value of bimodality in MLL is: 0.001"

Classification for each cell in MLL
## PAEAKL PAECCE PAKLPD PAKTCX PANGDN PANPSV PANTCT PANTRL PANTWV PANVUF PANXWX 
##      1      1      1      2      2      2      1      1      2      2      2 
## PAPAJE PAPWZR PAPXAZ PAPZTC PARAEF PARBIU PARBRA PARBXE PARDDY 
##      1      2      1      2      1      1      1      1      1
Uncertainty for each cell in MLL
##       PAEAKL       PAECCE       PAKLPD       PAKTCX       PANGDN       PANPSV 
## 0.000000e+00 2.442491e-14 3.153033e-14 8.856174e-08 3.897041e-07 1.087111e-03 
##       PANTCT       PANTRL       PANTWV       PANVUF       PANXWX       PAPAJE 
## 1.308371e-07 0.000000e+00 1.186693e-04 7.218878e-05 5.477767e-02 1.110223e-15 
##       PAPWZR       PAPXAZ       PAPZTC       PARAEF       PARBIU       PARBRA 
## 2.478551e-09 5.213963e-11 3.607619e-08 0.000000e+00 9.401147e-12 1.776357e-15 
##       PARBXE       PARDDY 
## 2.435829e-13 7.303972e-02
Ameijeiras-Alonso et al. Excess Mass Fit
## [1] 0
Bimodality Coefficient
##      logcount_mll
## [1,]         TRUE
Cheng and Hall Excess Mass
## [1] 0
Fisher and Marron Carmer-von Mises
## [1] 0
Hall and York Bandwidth test
## [1] 0
Hartigans’ dip test
## [1] 0
Sliverman Bandwidth
## [1] 0
mixR

## [1] "The bimodality p-value of MLL is: 0"

mclust fit object density plot in NSD1

## [1] "Mean of the two clusters in NSD1 are: 0.99  3.39"
## [1] "SD of the two clusters in NSD1 are: 0.94  0.94"
## [1] "Mixing proportion of the two clusters in NSD1 are: 0.92  0.08"
## [1] "The p-value of bimodality in NSD1 is: 0.01"

## PACEFU PADYIR PADYMH PADYTY PAEAFC PAEFEJ PAEFHC PAEPYP PAKLTN PAKTGM PAKTLL 
##      1      1      1      1      1      1      1      1      1      1      1 
## PALATT PALBCI PALFZW PALKYG PAMXZY PANXPU PAPXUF PARCRW PARGXP 
##      1      1      1      1      1      1      1      1      1
Uncertainty for each cell in NSD1
##       PACEFU       PADYIR       PADYMH       PADYTY       PAEAFC       PAEFEJ 
## 0.0018955582 0.1550247550 0.0932172510 0.0002792864 0.1152872198 0.0181423604 
##       PAEFHC       PAEPYP       PAKLTN       PAKTGM       PAKTLL       PALATT 
## 0.0145427299 0.0025108108 0.0782639316 0.3214504381 0.0002792864 0.0208546730 
##       PALBCI       PALFZW       PALKYG       PAMXZY       PANXPU       PAPXUF 
## 0.0061003281 0.1314722688 0.0454696208 0.0831176642 0.0066264523 0.0002792864 
##       PARCRW       PARGXP 
## 0.0021590398 0.0248007667
Ameijeiras-Alonso et al. Excess Mass Fit
## [1] 0
Bimodality Coefficient
##      logcount_nsd1
## [1,]         FALSE
Cheng and Hall Excess Mass
## [1] 0
Fisher and Marron Carmer-von Mises
## [1] 0
Hall and York Bandwidth test
## [1] 0.174
Hartigans’ dip test
## [1] 0
Sliverman Bandwidth
## [1] 0.426
mixR

## [1] "The bimodality p-value of NSD1 is: 0"

Parameters Grid

Power & False Positive

Fusion Group = MLL

Power

False Positive

Power with False Positive

Fusion Group = NSD1

Power

False Positive

Power with False Positive