Objective

Perform real world data analysis using BifurcatoR tools to check bimodality

dxsmall data: pediatric leukemia gene expression values with different fusion groups

## class: SingleCellExperiment 
## dim: 6 470 
## metadata(0):
## assays(2): counts logcounts
## rownames(6): MECOM PRDM16 ... NCAM1 KDM5D
## rowData names(4): gene_id medianLogCounts rangeLogCounts madLogCounts
## colnames(470): PAEAKL PAECCE ... PAYFYN PAYIET
## colData names(8): AgeGroup Sex ... OS OSI

First 10 columns of counts

##        PAEAKL PAECCE PAKLPD PAKTCX PANGDN PANPSV PANTCT PANTRL PANTWV PANVUF
## MECOM       2      5      6   8639   7992   1525     47      0   2361   2277
## PRDM16     16      3      3     18     25      2     17      2     13     15
## CD33     3969   3304   1825   2808   3913   3146   2256   1146   1030    839
## CD34       42  27571     39   2072   5201     15    202      8   5896    431
## NCAM1   29383    260   1062     16     90      2   4670     95      0      0
## KDM5D       1   5694      3     15   6730   5916   3583      1      4      8

Summary Statistics of dxsmall counts

##      MECOM           PRDM16             CD33              CD34        
##  Min.   :    0   Min.   :   0.00   Min.   :   18.0   Min.   :    2.0  
##  1st Qu.:    1   1st Qu.:   4.00   1st Qu.:  704.5   1st Qu.:   29.0  
##  Median :    6   Median :  10.00   Median : 1431.0   Median :  179.5  
##  Mean   : 1666   Mean   : 781.92   Mean   : 1839.8   Mean   : 2796.5  
##  3rd Qu.:   50   3rd Qu.:  73.25   3rd Qu.: 2547.5   3rd Qu.: 3714.0  
##  Max.   :30144   Max.   :8134.00   Max.   :11016.0   Max.   :35040.0  
##      NCAM1             KDM5D      
##  Min.   :    0.0   Min.   :    0  
##  1st Qu.:   15.0   1st Qu.:    3  
##  Median :   54.0   Median : 1492  
##  Mean   :  888.6   Mean   : 3501  
##  3rd Qu.:  456.0   3rd Qu.: 5829  
##  Max.   :29383.0   Max.   :61745

First 10 columns of logcounts

##            PAEAKL    PAECCE    PAKLPD    PAKTCX    PANGDN    PANPSV    PANTCT
## MECOM   1.1437896  2.159773  2.202325 12.635797 12.166000  9.918715  4.920319
## PRDM16  3.4164146  1.623430  1.485994  3.833752  3.942245  1.181463  3.534913
## CD33   11.2296987 11.162989 10.098914 11.014947 11.136046 10.962656 10.457859
## CD34    4.7225611 14.223301  4.609656 10.576675 11.546401  3.393759  6.986912
## NCAM1  14.1173132  7.502706  9.318746  3.676419  5.720816  1.181463 11.506984
## KDM5D   0.6823965 11.947954  1.485994  3.590813 11.918109 11.873421 11.124880
##           PANTRL    PANTWV    PANVUF
## MECOM   0.000000 10.507923 10.643734
## PRDM16  1.942100  3.171871  3.527659
## CD33   10.670511  9.312449  9.204888
## CD34    3.628841 11.827667  8.246217
## NCAM1   7.087735  0.000000  0.000000
## KDM5D   1.275795  1.792971  2.726274

Summary Statistics of dxsmall logcounts

##      MECOM            PRDM16            CD33             CD34       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 4.253   Min.   : 1.484  
##  1st Qu.: 1.238   1st Qu.: 2.222   1st Qu.: 9.485   1st Qu.: 4.856  
##  Median : 2.653   Median : 3.357   Median :10.445   Median : 7.565  
##  Mean   : 4.417   Mean   : 4.787   Mean   :10.144   Mean   : 8.085  
##  3rd Qu.: 5.521   3rd Qu.: 6.077   3rd Qu.:11.090   3rd Qu.:11.559  
##  Max.   :14.512   Max.   :12.697   Max.   :12.681   Max.   :14.593  
##      NCAM1            KDM5D       
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 3.751   1st Qu.: 1.873  
##  Median : 5.782   Median :11.124  
##  Mean   : 6.329   Mean   : 7.450  
##  3rd Qu.: 8.808   3rd Qu.:12.276  
##  Max.   :14.117   Max.   :15.161

rowData: Rows contains their own metadata as a dataframe format to describe their features.

## DataFrame with 6 rows and 4 columns
##                   gene_id medianLogCounts rangeLogCounts madLogCounts
##               <character>       <numeric>      <numeric>    <numeric>
## MECOM  ENSG00000085276.19         1.83653        14.5121      2.44570
## PRDM16 ENSG00000142611.17         3.32100        12.6973      2.85342
## CD33   ENSG00000105383.15        10.05794         9.9846      1.41178
## CD34   ENSG00000174059.17        10.86344        16.0191      3.86343
## NCAM1  ENSG00000149294.17         4.93129        15.8618      3.14829
## KDM5D  ENSG00000012817.16         7.56726        15.1607      7.56651

colData: Columns contains their own metadata as a dataframe format to describe their features.

## DataFrame with 470 rows and 8 columns
##        AgeGroup      Sex      FAB BlastPercent       fusion FusionGroup
##        <factor> <factor> <factor>    <numeric>     <factor>    <factor>
## PAEAKL   Child    Female       M4           NA KMT2A-MLLT1          MLL
## PAECCE   Infant   Male         M4           NA KMT2A-MLLT10         MLL
## PAKLPD   Infant   Female       M4           NA KMT2A-ELL            MLL
## PAKTCX   Child    Female       M4           NA KMT2A-ELL            MLL
## PANGDN   AYA      Male         M4           NA KMT2A-MLLT4          MLL
## ...         ...      ...      ...          ...          ...         ...
## PAYBSY    AYA     Female       M5           NA   NUP98-NSD1        NSD1
## PAYDUC    AYA     Female       M5           NA   NUP98-NSD1        NSD1
## PAYETG    Child   Female       NA           NA   NUP98-NSD1        NSD1
## PAYFYN    Child   Male         NA           NA   NUP98-NSD1        NSD1
## PAYIET    Child   Male         M1           NA   NUP98-NSD1        NSD1
##               OS       OSI
##        <numeric> <numeric>
## PAEAKL       163         1
## PAECCE       147         1
## PAKLPD      2707         0
## PAKTCX      1896         1
## PANGDN      3063         0
## ...          ...       ...
## PAYBSY       763         1
## PAYDUC       420         1
## PAYETG        NA         0
## PAYFYN        NA         0
## PAYIET       577         1

Age Group

Gender

Fusion Group

Density Plot of MECOM logcounts in different fusion group

Mclust fit: fitting the dataset using a mixture model based on the hypothesis that two clusters exist by specifying group number equals 2.

mclust fit object density plot in MLL

## [1] "Mean of the two clusters in MLL are: 2.55259783079578  12.2014398882966"
## [1] "SD of the two clusters in MLL are: 1.7339533318677  1.2803192180987"
## [1] "Mixing proportion of the two clusters in MLL are: 0.713896457765668  0.286103542234332"
Classification for Each cell in MLL
## PAEAKL PAECCE PAKLPD PAKTCX PANGDN PANPSV PANTCT PANTRL PANTWV PANVUF PANXWX 
##      1      1      1      2      2      2      1      1      2      2      2 
## PAPAJE PAPWZR PAPXAZ PAPZTC PARAEF PARBIU PARBRA PARBXE PARDDY 
##      1      2      1      2      1      1      1      1      1

Parameters Grid

Power & False Positive

Fusion Group = MLL

Power

False Positive

Comparing Methods in MLL

Fusion Group = NSD1

Power

False Positive

Comparing Methods in NSD1