Practical Mixture Models (w/Flow Cytometry Case Study)

Basic Workflow

0 Data Synthesis/Case Study

Data Case Study: Flow Cytometry & Remission Monitoring in AML

After a round of chemotherapy, doctors use flow cytometry to detect Minimal Residual Disease to a high sensitivity. These results determine whether or not to perform consolidation chemotherapy or move straight to a bone marrow transplant/conditioning chemotherapy (better prospects for patients with MRD after GODA followed by FLAG-IDA).

MRD Level Description
Negative MRD No detectable leukemia cells, or the level is below the sensitivity threshold (often undetectable or <0.01% of cells)
Low-level MRD Leukemia cells are present but at low levels (e.g., detectable but still <0.01% or at the edge of detectability).
High-level MRD Higher proportions of leukemia cells are still present (e.g., >0.01% of cells).

Remission refers to a state where leukemia cells are undetectable or present at extremely low levels in the body. In the context of Minimal Residual Disease (MRD), remission is often defined as having leukemia cells < 0.01% (or 1 in 10,000 cells).

Cells Passed Through a Laser in the Flow Cytometer and results are analyzed to detect abnormal cells using FCS and SSC features

In Flow Cytometry Blood or Bone Marrow Sample Collected and Cells Labeled with Fluorescent Antibodies which target leukemia-specific markers (specifically CD33/CD34 for AML).

  • Forward Scatter (FSC) represents the amount of light scattered by a cell as it passes through the laser in the flow cytometer. It is primarily used to measure the size of cells.
    • Larger cells (like blast cells or large granulocytes) scatter more light, so they appear as higher FSC values. Smaller cells (such as lymphocytes or red blood cells) scatter less light and appear with lower FSC values.
    • Gating: FSC can be used as the primary parameter to gate (select) populations of cells in flow cytometry. For example, you might use FSC to select all larger cells, like monocytes or granulocytes, before examining other properties.
  • Side Scatter (SSC) represents the amount of light scattered at a 90-degree angle to the incident light as the cells pass through the laser. This measurement gives an idea of the granularity or internal complexity of the cell. Granularity refers to the presence of internal structures, such as granules in the cytoplasm (e.g., in granulocytes), or organelles like mitochondria.
    • granulocytes (which have granules in their cytoplasm) will scatter more light at the side angle, resulting in higher SSC values. Lymphocytes and monocytes, which have fewer internal structures, will scatter less light, leading to lower SSC values.
In AML, markers like CD33 and CD34 help identify specific subsets of leukemic or immature cells.

A marker in flow cytometry refers to a specific molecule or protein located either on the surface or inside a cell. They are used to identify specific cell types and differentiate between normal and abnormal cells. These molecules are typically antigens (proteins or glycoproteins) that can be detected by antibodies. The antibodies are usually fluorescently labeled in flow cytometry, allowing researchers to identify and quantify cells based on the presence or absence of certain markers.

  • CD33: A marker used to identify myeloid cells. In AML, CD33 is typically found on leukemia blasts and helps diagnose the disease and monitor treatment response.
  • CD34: A marker used to identify hematopoietic stem cells and immature progenitor cells. In AML, CD34 is important for assessing the immaturity of the leukemia cells, which often correlates with a poor prognosis. These markers are critical in flow cytometry as they provide insights into the type and status of cells, and help distinguish malignant leukemia cells from healthy cells in AML.

1. Import library

library(mclust)

1. Fit a Gaussian Mixture Model

The Mclust function auto-selects the best number of clusters

gmm_model <- Mclust(dt)
summary(gmm_model)
---------------------------------------------------- 
Gaussian finite mixture model fitted by EM algorithm 
---------------------------------------------------- 

Mclust VVE (ellipsoidal, equal orientation) model with 3 components: 

 log-likelihood   n df       BIC       ICL
      -2439.396 650 15 -4975.946 -5070.984

Clustering table:
  1   2   3 
 48 303 299 

3. Visualise the Clusters

plot(gmm_model, what = "classification")

4. Extract Hard Cluster Assignments

dtHard<-dt[, cluster := gmm_model$classification]
print(dtHard)
         CD33     CD34 cluster
        <num>    <num>   <num>
  1: 7.697348 5.803238       2
  2: 8.655535 6.932589       2
  3: 7.567935 6.609074       2
  4: 9.983627 9.158047       2
  5: 9.475503 8.634386       2
 ---                          
646: 1.192657 5.478287       1
647: 3.101217 7.899359       1
648: 1.175333 7.288924       1
649: 9.212893 4.185876       1
650: 1.899736 7.326883       1

4. Extract Soft Cluster Membership (Fuzzy Assignments)

# Get the probability of each observation belonging to each cluster
probabilities <- gmm_model$z

# Add probabilities to the data.table
dtSoft <- dt[, paste0("prob_cluster_", 1:ncol(probabilities)) := as.data.table(probabilities)]

# View the first few rows
print(dtSoft)
         CD33     CD34 cluster prob_cluster_1 prob_cluster_2 prob_cluster_3
        <num>    <num>   <num>          <num>          <num>          <num>
  1: 7.697348 5.803238       2     0.04813156   9.518683e-01   1.553119e-07
  2: 8.655535 6.932589       2     0.02546233   9.745377e-01   2.446878e-11
  3: 7.567935 6.609074       2     0.03507488   9.649251e-01   1.182811e-08
  4: 9.983627 9.158047       2     0.05189080   9.481092e-01   1.397858e-18
  5: 9.475503 8.634386       2     0.03412570   9.658743e-01   2.373624e-16
 ---                                                                       
646: 1.192657 5.478287       1     0.99968151   1.552988e-08   3.184742e-04
647: 3.101217 7.899359       1     0.99999948   5.066404e-07   1.173076e-08
648: 1.175333 7.288924       1     1.00000000   1.257839e-11   3.286451e-09
649: 9.212893 4.185876       1     0.91988641   8.011358e-02   1.460388e-08
650: 1.899736 7.326883       1     0.99999997   2.157291e-09   3.228621e-08