Background & Objectives

  • Focus: Study cells in mouse brain to uncover mechanisms of neuro-degenerative diseases
  • Data: Neuron morphometric dataset, contains 31 shape features of 2,400+ D1- and D2-medium spiny neurons (Morphometrics: a set of features describing cell shape: size, branching angles, complexities, etc.)
Neuron morphometric (shape)
Neuron morphometric (shape)


  • The scientific goal:
    1. Study shape differences between two types of medium spiny neurons (D1 vs D2)
    2. Can we predict neuron type (D1 vs D2) based on cell shape features?
    3. Explore heterogeneity beyond binary D1/D2 cell types

  • My role:
    • Lead data processing, analysis and reporting
    • Dashboard development for interactive results exploration
    • Collborate with multi-functional teams (biologists, anatomists) to translate complex results into scientific insights

  • Main Analysis:
    1. EDA & Morphometric Analysis
    2. Supervised classification: D1 vs. D2
    3. Unsupervised method to explore underlying patterns


Pre-analysis setup: Import pre-defined functions and import libraries

Section 1: EDA & Morphometric Analysis

1.1 Read data

The data has been cleaned, merged, and organized into analyzable format

  • cell_morpho_data.csv:
    • consists of 2466 rows (2466 neurons)
    • categorical variables: brain, cell type, brain region, etc.
    • continuous variables: 31 shape features


1.2 Inclusion/Exclusion Criteria

Cell inclusion criteria:

  • Only include neurons registered to Caudal Putamen region
  • Only include non-surface neurons (surface neurons might be distorted during tissue cutting)
## [1] "The dataset has 1871  neurons,  595 neurons were excluded"


Other pre-processing:

  • Define orders and factorize the categorical variables
  • Define feature grouping: group 31 morphometric features to angle-related, length-related and complexity-related features

1.3 EDA

  • Categorical variables: generate descriptive table of cell type, Brain, striatal.Subregions
    Distribution of Type
    Type n percent
    D1 1017 54.4%
    D2 854 45.6%
Distribution of Striatal.Subregion
Striatal.Subregion n percent
CPr 279 15%
CPi 951 51%
CPc 641 34%
Distribution of Brain
Brain n percent
TME07-1 603 32.2%
TME08-1 157 8.4%
TME09-1 70 3.7%
TME10-1 327 17.5%
TME10-3 714 38.2%



  • Categorical varaibles: generate bar plot distribution of categorical variables: cell type, Brain, striatal.Subregions

  • Continuous variables: generate histogram distribution



  • Continuous variables: generate descriptive table showing mean(std) for morphometric features

    Morphometric Mean (SD) — Overall, by Type, Striatal Subregion, and Brain
    feature All Type:D1 Type:D2 Striatal.Subregion:CPr Striatal.Subregion:CPi Striatal.Subregion:CPc Brain:TME07-1 Brain:TME08-1 Brain:TME09-1 Brain:TME10-1 Brain:TME10-3
    ABEL_All 47.36 (8.12) 48.98 (8.29) 45.43 (7.47) 47.81 (7.78) 48.29 (8.65) 45.77 (7.17) 47.98 (10.19) 49.86 (7.27) 46.25 (7.38) 46.77 (7.37) 46.65 (6.42)
    ABEL_Internal 18.80 (4.66) 19.73 (4.75) 17.69 (4.31) 19.28 (4.63) 19.04 (4.65) 18.22 (4.65) 19.68 (5.24) 20.22 (4.74) 19.20 (5.02) 18.17 (4.31) 17.98 (4.01)
    ABEL_Terminal 70.19 (12.40) 72.21 (12.84) 67.79 (11.42) 70.20 (12.68) 71.79 (12.87) 67.82 (11.15) 70.18 (14.56) 73.91 (11.45) 68.05 (12.39) 69.81 (11.83) 69.78 (10.64)
    BAPL_All 55.08 (9.52) 56.68 (9.61) 53.17 (9.04) 56.60 (9.06) 56.28 (10.09) 52.63 (8.29) 55.68 (11.71) 57.75 (8.56) 53.71 (9.01) 54.21 (8.70) 54.51 (7.78)
    BAPL_Internal 20.11 (5.15) 21.06 (5.24) 18.97 (4.81) 20.84 (5.32) 20.36 (5.09) 19.42 (5.10) 21.09 (5.69) 21.75 (5.37) 20.58 (5.60) 19.37 (4.78) 19.21 (4.48)
    BAPL_Terminal 83.01 (14.55) 84.95 (14.94) 80.70 (13.74) 84.64 (14.73) 85.12 (15.08) 79.18 (12.83) 82.79 (16.74) 86.94 (13.46) 80.39 (14.88) 82.25 (13.84) 82.94 (12.88)
    Balancing_Factor 0.76 (0.05) 0.77 (0.04) 0.75 (0.05) 0.73 (0.06) 0.76 (0.04) 0.78 (0.04) 0.76 (0.05) 0.76 (0.04) 0.76 (0.04) 0.76 (0.04) 0.76 (0.05)
    Bif_ampl_local 72.82 (10.47) 72.62 (10.60) 73.05 (10.31) 72.75 (10.99) 73.02 (10.61) 72.54 (10.02) 76.16 (10.49) 74.60 (9.76) 69.17 (9.64) 67.30 (9.08) 72.49 (10.07)
    Bif_ampl_remote 68.55 (8.57) 67.79 (8.23) 69.46 (8.88) 72.85 (9.26) 68.74 (8.12) 66.40 (8.17) 68.48 (9.01) 69.39 (7.66) 68.16 (8.29) 66.93 (7.96) 69.21 (8.58)
    Bif_tilt_local 120.23 (9.99) 120.54 (10.08) 119.87 (9.88) 119.97 (10.34) 120.24 (10.11) 120.33 (9.67) 116.65 (10.35) 118.75 (9.11) 124.16 (9.44) 125.38 (8.32) 120.84 (9.36)
    Bif_tilt_remote 124.30 (7.14) 125.14 (6.77) 123.31 (7.43) 120.20 (7.56) 124.29 (6.69) 126.11 (6.85) 123.69 (7.48) 123.59 (6.90) 125.05 (6.79) 125.85 (6.31) 124.20 (7.18)
    Bif_torque_local 89.47 (10.15) 88.90 (10.17) 90.15 (10.10) 90.87 (10.58) 89.02 (9.80) 89.53 (10.43) 89.44 (10.83) 88.95 (10.26) 88.69 (11.51) 89.51 (9.98) 89.67 (9.48)
    Bif_torque_remote 89.31 (10.26) 89.03 (10.57) 89.65 (9.89) 89.84 (11.09) 89.06 (9.79) 89.45 (10.59) 89.39 (10.84) 88.38 (9.09) 89.23 (12.25) 90.14 (9.66) 89.08 (10.07)
    Branch_Order 5.07 (1.21) 4.99 (1.20) 5.16 (1.21) 4.90 (1.28) 5.05 (1.18) 5.17 (1.20) 5.05 (1.31) 5.32 (1.22) 4.93 (1.08) 5.02 (1.17) 5.06 (1.13)
    Centripetal_Bias 4.48 (1.05) 4.67 (1.10) 4.25 (0.93) 3.95 (0.97) 4.37 (1.00) 4.87 (1.01) 4.60 (1.22) 4.38 (0.89) 4.42 (0.82) 4.55 (0.95) 4.38 (0.97)
    Convexity 0.84 (0.06) 0.84 (0.05) 0.84 (0.06) 0.84 (0.06) 0.84 (0.05) 0.83 (0.06) 0.83 (0.06) 0.84 (0.05) 0.84 (0.06) 0.84 (0.05) 0.84 (0.05)
    Depth 119.37 (33.24) 123.48 (34.13) 114.48 (31.47) 115.11 (32.39) 125.55 (32.85) 112.06 (32.45) 116.21 (34.90) 132.88 (34.40) 115.08 (33.27) 118.80 (31.52) 119.75 (31.60)
    Fractal_Dim 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01) 1.03 (0.01)
    Height 214.21 (33.80) 222.87 (34.98) 203.89 (29.16) 209.53 (34.43) 216.25 (33.28) 213.21 (34.09) 218.14 (37.25) 228.59 (32.42) 214.53 (35.26) 212.08 (30.82) 208.66 (30.79)
    Length 2843.84 (848.87) 2924.02 (904.32) 2748.37 (767.36) 2499.76 (743.16) 2932.40 (800.66) 2862.22 (922.14) 2746.14 (860.11) 3186.88 (909.39) 2650.40 (881.81) 2867.01 (846.40) 2859.29 (802.09)
    Max_EucDistance 153.17 (22.08) 159.26 (22.33) 145.91 (19.43) 152.68 (21.09) 154.64 (22.22) 151.20 (22.15) 156.16 (23.55) 164.44 (24.35) 153.81 (21.61) 151.21 (20.33) 148.99 (19.79)
    Max_PathDistance 187.29 (27.62) 193.17 (27.90) 180.29 (25.58) 189.58 (25.33) 189.45 (27.38) 183.10 (28.46) 192.25 (29.08) 198.57 (31.89) 183.48 (23.17) 182.05 (23.86) 183.40 (25.95)
    N_bifs 23.74 (8.18) 23.63 (8.42) 23.88 (7.88) 19.94 (6.62) 24.05 (7.69) 24.94 (8.98) 22.87 (8.69) 25.37 (8.19) 22.51 (7.95) 24.31 (8.07) 23.99 (7.71)
    N_branch 52.76 (16.80) 52.66 (17.30) 52.88 (16.19) 44.79 (13.50) 53.26 (15.68) 55.48 (18.58) 51.01 (17.87) 56.03 (16.68) 49.99 (16.44) 53.93 (16.63) 53.25 (15.82)
    N_stems 5.27 (1.56) 5.40 (1.61) 5.12 (1.49) 4.91 (1.50) 5.16 (1.41) 5.60 (1.74) 5.28 (1.69) 5.29 (1.42) 4.96 (1.43) 5.31 (1.62) 5.27 (1.47)
    N_tips 29.02 (8.69) 29.03 (8.95) 29.00 (8.37) 24.85 (6.96) 29.21 (8.04) 30.54 (9.67) 28.15 (9.25) 30.66 (8.54) 27.47 (8.54) 29.62 (8.63) 29.26 (8.18)
    Partition_asymmetry 0.39 (0.09) 0.39 (0.09) 0.40 (0.10) 0.40 (0.10) 0.39 (0.09) 0.40 (0.09) 0.40 (0.10) 0.39 (0.09) 0.41 (0.09) 0.39 (0.09) 0.40 (0.09)
    Sum_EucDistance 80197.76 (30543.57) 85365.23 (33092.51) 74043.99 (25904.69) 65957.33 (27102.52) 83535.33 (29466.79) 81444.31 (31802.41) 77838.70 (28962.06) 100744.96 (34507.35) 78455.15 (32118.58) 76431.03 (31270.20) 79567.92 (28744.13)
    Sum_PathDistance 99201.08 (37074.91) 104671.88 (39950.07) 92686.09 (32159.04) 83542.74 (33201.15) 103741.44 (36356.19) 99280.32 (37919.82) 96406.34 (35299.56) 124755.00 (42191.82) 96341.66 (38802.68) 93919.13 (37813.49) 98641.73 (34637.00)
    Terminal_degree 1.25 (0.09) 1.25 (0.08) 1.26 (0.10) 1.26 (0.14) 1.25 (0.08) 1.25 (0.08) 1.26 (0.11) 1.26 (0.08) 1.26 (0.08) 1.26 (0.08) 1.24 (0.07)
    Width 167.20 (30.56) 171.68 (31.20) 161.85 (28.90) 161.88 (31.43) 171.47 (29.97) 163.18 (30.16) 167.78 (31.63) 179.88 (32.02) 161.80 (37.18) 166.77 (28.64) 164.63 (28.74)



  • Continuous variables: correlation analysis

  • Some morphometric features are highly-correlated

  • Need to remove feature redundancy in modeling



1.4 Morphometric Analysis: Comparing D1-MSN and D2-MSN morphometrics

  • Neurons are from 5 mouse brains, an EmpericalBayes Linear Model was used to adjust for brain effect

  • Morohometric comparison (D1 vs. D2 neurons) using two-sample t-test, p-value adjusted using Bonferroni method for multiple comparisons

  • Generate result tables, including mean difference, 95% CI, effect size, test statistic, p-values and adjusted p-values

MSN morphometric by Type
Morphometric Mean Difference 95% CI Effect size statistic p.value p.adjusted
Bif_ampl_local -0.73 (-1.64, 0.18) -0.073 2.503
Bif_ampl_remote -1.73 (-2.5, -0.95) -0.203 19.190 *** ***
Branch_Order -0.18 (-0.29, -0.07) -0.153 10.911 ***
Depth 8.64 (5.67, 11.62) 0.265 32.487 *** ***
Sum_EucDistance 10458.92 (7779.79, 13138.06) 0.355 58.620 *** ***
Max_EucDistance 12.59 (10.71, 14.47) 0.609 172.125 *** ***
Fractal_Dim 0.00 (0, 0) -0.187 16.147 *** **
Height 18.02 (15.1, 20.94) 0.562 146.700 *** ***
Length 170.24 (94.12, 246.36) 0.204 19.241 *** ***
N_bifs -0.23 (-0.97, 0.51) -0.029 0.380
N_branch -0.18 (-1.71, 1.34) -0.011 0.055
N_stems 0.28 (0.14, 0.43) 0.182 15.356 *** **
N_tips 0.05 (-0.74, 0.84) 0.006 0.016
Partition_asymmetry 0.00 (-0.01, 0.01) -0.028 0.377
Sum_PathDistance 10910.20 (7649.61, 14170.8) 0.305 43.066 *** ***
Max_PathDistance 12.03 (9.63, 14.44) 0.456 96.402 *** ***
Width 9.31 (6.59, 12.04) 0.311 44.956 *** ***
Terminal_degree -0.01 (-0.02, 0) -0.098 4.438
Bif_tilt_local 0.95 (0.09, 1.82) 0.100 4.683
Bif_tilt_remote 1.91 (1.27, 2.55) 0.272 34.281 *** ***
Bif_torque_local -1.20 (-2.12, -0.28) -0.118 6.500
Bif_torque_remote -0.57 (-1.5, 0.36) -0.056 1.431
ABEL_All 3.41 (2.69, 4.13) 0.432 86.789 *** ***
BAPL_All 3.36 (2.51, 4.21) 0.361 60.405 *** ***
ABEL_Internal 1.89 (1.49, 2.3) 0.422 82.817 *** ***
BAPL_Internal 1.93 (1.48, 2.38) 0.388 69.904 *** ***
ABEL_Terminal 4.29 (3.18, 5.4) 0.353 57.761 *** ***
BAPL_Terminal 4.12 (2.82, 5.43) 0.287 38.329 *** ***
Balancing_Factor 0.02 (0.01, 0.02) 0.394 72.235 *** ***
Centripetal_Bias 0.43 (0.33, 0.52) 0.416 80.486 *** ***
Convexity 0.00 (0, 0.01) 0.019 0.159

  • Morohometric comparison (D1 vs. D2 neurons): generate a heatmap to visualize the D1 vs. D2 differences overall, and across brain regions

    • Main differences are size-related features
    • D1 neurons have bigger sizes and branch-lengths than D2 neurons
    • The differences are more profound in CPi and CPc regions



    Section 2: D1 vs. D2 supervised classification

    2.1 Data preparation

    • Generate train and test data
    ## Train set has 1496 neurons
    ## Test set has 375 neurons



    • Data scaling, PCA-transformation:
      • Some morphomertic features are highly-correlated, for the quality of the model we will reduce the dimensionality of the data
      • Relationship between explained variances and number of principal components
      • Only keep the number of principal components that explain more than 95% of the total variance



    • Result shows 15 PC can explain >= 95% of the total variance
    • Then transform the data by only keeping 15 principal components
    • Also calculate weights since there is a slight imbalance in the D1 and D2 neuron counts
    ## D1 neurons weight: 0.46
    ## D2 neurons weight: 0.54



    2.2 Fit random forest base model

    • Start with a base model and check model performance on the unseen test set, including the accuracy, precision, recall, F1-score, and ROC-AUC
    ## 
    ## 
    ## Table: Model Performance Metrics on Test Set
    ## 
    ## |Metric    | Value|
    ## |:---------|-----:|
    ## |accuracy  | 0.637|
    ## |precision | 0.655|
    ## |recall    | 0.706|
    ## |f_meas    | 0.679|
    ## |roc_auc   | 0.721|



    2.3 Fine-tune the random forest model

    • Use grid search method and train with 3-fold cross validation
    • Parameters to tune:
      • mtry: number of features to used for splitting
      • trees: number of trees
      • min_n: minumum number of samples used to split
      • sample_prop: number of samples used in each tree
      • tree_depth: depth of the tree
    • Make prediction using the fine-tuned model on the unseen test set
    • Compare the model performance: base vs. fine-tuned model



    • The fine-tuned model has slight increase in f1-score and other metrics
    • The base model is already good enough for the current dataset
    • The D1 vs. D2 neurons morphometric differences are not profound
    D1 and D2 MSN shape
    D1 and D2 MSN shape


    2.4 Feature importance

    • To explain which are important shape features for D1 vs. D2 classification
    • Extract the feature importance (based on impurity decrease), predictors usesd were PCs
    • Map the PCs back to original shape features based on loadings

    • Main features separating D1 and D2 neurons are size-related features


    Section3: Clustering analysis

    3.1 Data preparation

    • Remove redundant features
    • Standardize the features
    • Calculate dissimilarity matrix used for clustering


    3.2 Determine the number of clusters

    • Define a grid of different parameter combination
    • Perform hierarchical clustering using different parameter combination
    • Determine the number of clusters based on Sillouette score and the discussion with biologists
      • Sillouette score: measures how well a point fits within its own cluster compared to the nearest other cluster



    • Decided to keep 7 cell clusters
    • Yields relatively high Sillouette score
    • Makes biological sense for biologists based on the cluster distribution in 3D mouse brain (based on the dashboard prototype)


    3.3 Final clustering

    • Peroform the clustering
    • Generate dendrogram

    3.4 Morphometric distribution across cell clusters

    • Visualize cell structures in each cluster
      • Randomly sample cells from each cluster
      • Visualize the cell shape on same scale
      • Identified 7 subgroups with distinct morphometric signatures
    Randomly sampled cells from each cluster
    Randomly sampled cells from each cluster


    3.5 Visualize cells cluster spatial ditribution

    • How do these clusters distributed in the 3D mouse brain?
    • Generate plot for spatial distribution of cells
    Spatial distribution of cells
    Spatial distribution of cells


    • Clusters have
      • Distinct morphometric characteristics
      • Interesteing spatial patterns


    3.6 R Shiny dashboard

    • An interactive tool for biologists, brain anatomists, computational biologists to explore complex research findings together
    • New biological insights: cell shape – brain structure – brain functionality



    Achievements & Outcome

    • Validated the D1- and D2- shape differences at scale — from dozens of neurons in prior studies to 2,400+ in this study
    • Derived new scientific insights into cell shapes and brain structure-function
    • Provided a blueprint for understanding which neurons are most vulnerable in disease, and points to more precise, circuit-specific strategies for drug design and testing
    cell shape - brain structure - function
    cell shape - brain structure - function


    • Contributed to a paper recently accepted by Nature Neuroscience
    NN Accepted Paper
    NN Accepted Paper



    Discussion & Future Work

    • Study axon shape (in addition to dendrites)
    • Validate on more datasets (as data is being continuously generated)
    • Incorporate transcriptomic, electrophysiology data



    Acknowledgements

    Dr. William Yang
    (PI)
    Dr. Chris Park
    (Project Scientist)
    Dr. Masood Akram
    (Postdoctoral Scholar)
    Dr. Peter Langfelder
    (Statistician)


    • Collaborators: Dr. Hongwei Dong, Dr. Daniel Tward, Dr. Jason Cong, Dr. Giorgio Ascoli, Nicholas Foster, Andrew Bennecke, Karl Marrett, Keivan Moradi, Sumit Nanda


    Thank You