Step 10: Neuron Pseudotime Analysis

Damage trajectory analysis using current neuron Seurat object

Setup: Environment and Data

In this step, we start from the current neuron Seurat object generated in Step 09.
The goal is to infer a neuronal damage trajectory rather than a developmental trajectory.

We use the neuron-subclustered Seurat object as the starting point.

library(Seurat)
library(tidyverse)
library(patchwork)
library(SingleCellExperiment)
library(slingshot)
library(tradeSeq)

# Load current neuron Seurat object
import_path <- "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/SO_09_NeuronSub.RDS"

SO_neuron <- readRDS(import_path)

SO_neuron
An object of class Seurat 
47773 features across 1413 samples within 1 assay 
Active assay: RNA (47773 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Step 10.1: Check Metadata and Reductions

Before trajectory analysis, we confirm that the object contains UMAP, PCA, clusters, and condition metadata.

# Check metadata columns
colnames(SO_neuron@meta.data)
 [1] "orig.ident"        "nCount_RNA"        "nFeature_RNA"     
 [4] "Sample"            "condition"         "Sex"              
 [7] "scDblFinder.score" "scDblFinder.class" "percent.mt"       
[10] "merge_group"       "S.Score"           "G2M.Score"        
[13] "Phase"             "old.ident"         "RNA_snn_res.0.5"  
[16] "seurat_clusters"   "RNA_snn_res.0.3"  
# Check reductions
Reductions(SO_neuron)
[1] "pca"  "umap"
# Check cluster distribution
table(SO_neuron$seurat_clusters)

  0   1   2   3   4   5   6   7 
374 344 314 163 100  49  35  34 
# Check condition distribution
table(SO_neuron$condition)

  C  PD 
986 427 
p1 <- DimPlot(
  SO_neuron,
  reduction = "umap",
  group.by = "seurat_clusters",
  label = TRUE,
  repel = TRUE,
  pt.size = 0.4
) + labs(title = "Neuron Subclusters")

p2 <- DimPlot(
  SO_neuron,
  reduction = "umap",
  group.by = "condition",
  pt.size = 0.4,
  cols = c("C" = "dodgerblue", "PD" = "firebrick")
) + labs(title = "Condition Distribution")

p1 | p2

Step 10.2: Confirm Neuronal Identity

Because the previous annotation included excitatory and inhibitory neurons, we first validate neuronal marker expression.

Excitatory neuron markers:

SLC17A7, CAMK2A, NEUROD1, GRM4, UNC13C

Inhibitory neuron markers:

GAD1, GAD2, EBF3

Dopaminergic neuron markers:

TH, SLC6A3, DDC, SLC18A2, NR4A2
FeaturePlot(
  SO_neuron,
  features = c(
    "SLC17A7", "CAMK2A", "NEUROD1",
    "GAD1", "GAD2", "EBF3",
    "TH", "SLC6A3", "DDC", "SLC18A2", "NR4A2"
  ),
  ncol = 4,
  pt.size = 0.4
)

Step 10.3: Add Neuronal Damage Scores

Here, we create module scores for PD-related neuronal damage.

These scores are not the final answer.
They are used to define the biological direction of the trajectory.

mitochondrial_genes <- c(
  "PINK1", "PRKN", "PARK7", "ATP5F1A", "NDUFS1", "NDUFB8", "COX4I1"
)

oxidative_stress_genes <- c(
  "HMOX1", "NQO1", "SOD2", "TXN", "FOS", "JUN", "DDIT3"
)

er_stress_genes <- c(
  "HSPA5", "ATF4", "ATF6", "XBP1", "ERN1", "DDIT3"
)

apoptosis_genes <- c(
  "BAX", "CASP3", "CASP9", "BCL2L11", "BBC3"
)

pd_risk_genes <- c(
  "SNCA", "LRRK2", "PINK1", "PRKN", "PARK7", "GBA", "MAPT"
)

excitatory_identity_genes <- c(
  "SLC17A7", "CAMK2A", "NEUROD1", "GRM4", "UNC13C"
)

inhibitory_identity_genes <- c(
  "GAD1", "GAD2", "EBF3"
)

pd_damage_genes <- unique(c(
  mitochondrial_genes,
  oxidative_stress_genes,
  er_stress_genes,
  apoptosis_genes,
  pd_risk_genes
))
# Keep only genes present in the Seurat object
pd_damage_genes_present <- intersect(pd_damage_genes, rownames(SO_neuron))
excitatory_genes_present <- intersect(excitatory_identity_genes, rownames(SO_neuron))
inhibitory_genes_present <- intersect(inhibitory_identity_genes, rownames(SO_neuron))

pd_damage_genes_present
 [1] "PINK1"   "PRKN"    "PARK7"   "ATP5F1A" "NDUFS1"  "NDUFB8"  "COX4I1" 
 [8] "HMOX1"   "NQO1"    "SOD2"    "TXN"     "FOS"     "JUN"     "DDIT3"  
[15] "HSPA5"   "ATF4"    "ATF6"    "XBP1"    "ERN1"    "BAX"     "CASP3"  
[22] "CASP9"   "BCL2L11" "BBC3"    "SNCA"    "LRRK2"   "MAPT"   
excitatory_genes_present
[1] "SLC17A7" "CAMK2A"  "NEUROD1" "GRM4"    "UNC13C" 
inhibitory_genes_present
[1] "GAD1" "GAD2" "EBF3"
SO_neuron <- AddModuleScore(
  SO_neuron,
  features = list(pd_damage_genes_present),
  name = "PD_damage_score"
)

SO_neuron <- AddModuleScore(
  SO_neuron,
  features = list(excitatory_genes_present),
  name = "Excitatory_identity_score"
)

SO_neuron <- AddModuleScore(
  SO_neuron,
  features = list(inhibitory_genes_present),
  name = "Inhibitory_identity_score"
)
p1 <- FeaturePlot(
  SO_neuron,
  features = "PD_damage_score1",
  pt.size = 0.4
) + labs(title = "PD Damage Score")

p2 <- VlnPlot(
  SO_neuron,
  features = "PD_damage_score1",
  group.by = "condition",
  pt.size = 0
) + labs(title = "PD Damage Score by Condition")

p1 | p2

Step 10.4: Decide Whether to Analyze All Neurons or One Neuronal Lineage

Important warning:

If excitatory and inhibitory neurons are analyzed together, pseudotime may capture cell identity differences rather than disease progression.

Therefore, we first check the relationship between subclusters, condition, and neuronal identity scores.

table(SO_neuron$seurat_clusters, SO_neuron$condition)
   
      C  PD
  0 329  45
  1 228 116
  2 239  75
  3  18 145
  4 100   0
  5  31  18
  6  35   0
  7   6  28
p1 <- VlnPlot(
  SO_neuron,
  features = "Excitatory_identity_score1",
  group.by = "seurat_clusters",
  pt.size = 0
) + labs(title = "Excitatory Identity by Cluster")

p2 <- VlnPlot(
  SO_neuron,
  features = "Inhibitory_identity_score1",
  group.by = "seurat_clusters",
  pt.size = 0
) + labs(title = "Inhibitory Identity by Cluster")

p1 | p2

Step 10.5: Choose Cells for Trajectory

For the first pseudotime test, we use all neuron subclusters.

Later, this should be repeated separately for excitatory and inhibitory neurons.

SO_traj <- SO_neuron

SO_traj
An object of class Seurat 
47773 features across 1413 samples within 1 assay 
Active assay: RNA (47773 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Optional: excitatory-only trajectory.

SO_traj <- subset(
  SO_neuron,
  subset = Excitatory_identity_score1 > Inhibitory_identity_score1
)

Optional: inhibitory-only trajectory.

SO_traj <- subset(
  SO_neuron,
  subset = Inhibitory_identity_score1 > Excitatory_identity_score1
)

Step 10.6: Convert Seurat Object to SingleCellExperiment

Slingshot works with SingleCellExperiment.

We use the existing UMAP coordinates and Seurat clusters.

sce <- as.SingleCellExperiment(SO_traj)

# Add UMAP coordinates from Seurat to SingleCellExperiment
reducedDims(sce)$UMAP <- Embeddings(SO_traj, "umap")

# Store cluster labels
sce$cluster <- as.character(SO_traj$seurat_clusters)

# Store condition
sce$condition <- SO_traj$condition

# Store PD damage score
sce$PD_damage_score <- SO_traj$PD_damage_score1

Step 10.7: Run Slingshot Pseudotime

We infer lineage structure using UMAP and Seurat clusters.

The starting cluster should ideally be:

control-enriched
low PD damage score
healthy-like neuron cluster

First, we identify candidate root clusters.

root_summary <- SO_traj@meta.data %>%
  mutate(cluster = as.character(seurat_clusters)) %>%
  group_by(cluster) %>%
  summarise(
    n_cells = n(),
    mean_PD_damage = mean(PD_damage_score1, na.rm = TRUE),
    percent_PD = mean(condition == "PD", na.rm = TRUE),
    percent_C = mean(condition == "C", na.rm = TRUE)
  ) %>%
  arrange(mean_PD_damage)

root_summary
# A tibble: 8 × 5
  cluster n_cells mean_PD_damage percent_PD percent_C
  <chr>     <int>          <dbl>      <dbl>     <dbl>
1 3           163        -0.102       0.890     0.110
2 7            34        -0.0925      0.824     0.176
3 5            49        -0.0807      0.367     0.633
4 2           314        -0.0668      0.239     0.761
5 1           344        -0.0554      0.337     0.663
6 4           100        -0.0450      0         1    
7 6            35         0.0492      0         1    
8 0           374         0.0873      0.120     0.880

Choose the cluster with low damage score and high control percentage as the starting cluster.

start_cluster <- root_summary$cluster[5]

start_cluster
[1] "1"
sce <- slingshot(
  sce,
  clusterLabels = "cluster",
  reducedDim = "UMAP",
  start.clus = start_cluster
)

Step 10.8: Visualize Slingshot Trajectory

umap <- reducedDims(sce)$UMAP

plot(
  umap,
  col = as.factor(sce$condition),
  pch = 16,
  asp = 1,
  xlab = "UMAP_1",
  ylab = "UMAP_2",
  main = "Slingshot Trajectory on Neuron UMAP"
)

lines(SlingshotDataSet(sce), lwd = 2, col = "black")
legend(
  "topright",
  legend = levels(as.factor(sce$condition)),
  col = seq_along(levels(as.factor(sce$condition))),
  pch = 16
)

Step 10.9: Extract Pseudotime

pseudotime_mat <- slingPseudotime(sce)

head(pseudotime_mat)
                              Lineage1   Lineage2   Lineage3
GSM6459689_AAGTGAAAGGTCTTTG-1 3.055415  3.0408320  3.0704455
GSM6459689_ACAGAAATCGTTCTGC-1 3.259653  3.2628823  3.2585519
GSM6459689_ACAGCCGCAACTTGCA-1 1.454322  1.4531150  1.4539668
GSM6459689_AGAACAAGTCCAGCCA-1 0.461310  0.4610136  0.4612008
GSM6459689_AGCTTCCTCGTTCTAT-1       NA 12.4407873 11.7222602
GSM6459689_AGTGTTGTCTCCGTGT-1 1.945734  1.9477154  1.9481892

If multiple lineages exist, we use the first lineage for the initial analysis.

SO_traj$pseudotime_slingshot <- pseudotime_mat[, 1]

summary(SO_traj$pseudotime_slingshot)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.     NAs 
  0.000   4.208  19.972  15.009  22.972  31.098     131 
FeaturePlot(
  SO_traj,
  features = "pseudotime_slingshot",
  reduction = "umap",
  pt.size = 0.5
) + labs(title = "Slingshot Pseudotime")

Step 10.10: Validate Biological Direction

Pseudotime is useful only if it reflects neuronal damage.

Therefore, PD damage score should generally increase along pseudotime.

pt_df <- SO_traj@meta.data %>%
  rownames_to_column("cell_id") %>%
  filter(!is.na(pseudotime_slingshot))

ggplot(pt_df, aes(x = pseudotime_slingshot, y = PD_damage_score1, color = condition)) +
  geom_point(alpha = 0.25, size = 0.5) +
  geom_smooth(method = "loess", se = TRUE) +
  theme_classic() +
  labs(
    title = "PD Damage Score Along Pseudotime",
    x = "Slingshot pseudotime",
    y = "PD damage score"
  )
`geom_smooth()` using formula = 'y ~ x'

ggplot(pt_df, aes(x = pseudotime_slingshot, fill = condition)) +
  geom_density(alpha = 0.4) +
  theme_classic() +
  labs(
    title = "Condition Distribution Along Pseudotime",
    x = "Slingshot pseudotime",
    y = "Density"
  )

Step 10.11: Fit Gene Expression Along Pseudotime

We use tradeSeq to identify genes whose expression changes along pseudotime.

For memory safety, we first test this on variable genes.

counts <- GetAssayData(SO_traj, assay = "RNA", layer = "counts")

# Start with only 500 variable genes for speed
genes_use <- head(VariableFeatures(SO_traj), 500)
genes_use <- intersect(genes_use, rownames(counts))

counts_use <- counts[genes_use, ]

# Extract pseudotime and cell weights
pt <- slingPseudotime(sce)
cw <- slingCurveWeights(sce)

# Keep only cells with non-NA pseudotime in at least one lineage
cells_keep <- rowSums(!is.na(pt)) > 0

# Subset counts, pseudotime, and cell weights
counts_use <- counts_use[, cells_keep]
pt_use <- pt[cells_keep, , drop = FALSE]
cw_use <- cw[cells_keep, , drop = FALSE]

# Replace remaining NA pseudotime values with 0
# and set corresponding weights to 0
pt_use[is.na(pt_use)] <- 0
cw_use[is.na(cw_use)] <- 0

dim(counts_use)
[1]  500 1413
dim(pt_use)
[1] 1413    3
dim(cw_use)
[1] 1413    3
set.seed(123)

gam_fit <- fitGAM(
  counts = counts_use,
  pseudotime = pt_use,
  cellWeights = cw_use,
  nknots = 4,
  verbose = TRUE
)
assoc_res <- associationTest(gam_fit)

assoc_res <- assoc_res %>%
  rownames_to_column("gene") %>%
  arrange(pvalue)

head(assoc_res, 20)
              gene waldStat df pvalue  meanLogFC
1            FOXP2 111.8810  9      0  2.0077442
2             TAC3 111.0943  9      0  3.8950463
3              SST 155.3165  9      0  2.5503998
4           CXCL14 206.9880  9      0 12.9936887
5             GPC5 214.7237  9      0  2.3663550
6             NEFH 113.0223  9      0  1.4105611
7             GRM8 140.4386  9      0  4.2552905
8             GPC6 261.6293  9      0  3.2438202
9             SDK1 168.1491  9      0  3.6891448
10         PCDH11X 124.5028  9      0  3.0967951
11           EPHA3 107.1577  9      0 10.9012681
12            SGCD 122.6612  9      0  3.4531507
13           NELL1 110.6262  9      0  4.7996016
14 ENSG00000227681 153.0025  9      0  4.2740136
15           DOCK8 118.9523  9      0  0.9115073
16           HTR2C 204.6694  9      0  5.5060383
17           CALB1 257.6907  9      0  1.0875341
18       LINC00299 101.9704  9      0  3.0901442
19           PTPRK 134.8420  9      0  1.9203319
20           GRIK1 308.0610  9      0  4.1588695
write.csv(
  assoc_res,
  "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/Step10_slingshot_pseudotime_genes.csv",
  row.names = FALSE
)

Step 10.12: Extract lncRNA Candidates Along Pseudotime

Here, we perform a temporary lncRNA filter based on gene symbols.

This is not perfect.
Later, we should use GENCODE gene biotype annotation.

lncRNA_pseudotime_candidates <- assoc_res %>%
  filter(grepl("^LINC|^AC[0-9]|^AL[0-9]|^AF[0-9]|AS[0-9]$|-AS", gene)) %>%
  arrange(pvalue)

head(lncRNA_pseudotime_candidates, 20)
          gene  waldStat df       pvalue meanLogFC
1    LINC00299 101.97040  9 0.000000e+00  3.090144
2    ADAM7-AS1 144.33818  9 0.000000e+00  3.076963
3  ADAMTS9-AS2 142.40643  9 0.000000e+00  5.025908
4     DLX6-AS1 191.23552  9 0.000000e+00  7.540361
5    LINC01151 111.68120  9 0.000000e+00  4.493272
6    SATB1-AS1  89.18677  9 2.331468e-15  7.784687
7    LINC01266  83.17463  9 3.774758e-14  2.708751
8    NR2F2-AS1  80.19070  9 1.481038e-13  1.780920
9        FRAS1  79.81292  9 1.760814e-13  3.552578
10   LINC02552  71.25062  9 8.652967e-12  2.585610
11 C1QTNF7-AS1  71.13624  9 9.112378e-12  1.464957
12   LINC03000  69.09894  9 2.286493e-11  8.554817
13   LINC02055  64.84927  9 1.543466e-10  2.445472
14    SGO1-AS1  59.71129  9 1.523850e-09  3.708863
15   LINC02253  56.16850  9 7.288578e-09  1.807413
16   ZFPM2-AS1  55.65706  9 9.126780e-09  1.583751
17   LINC02350  55.15992  9 1.135399e-08  7.063987
18    GDNF-AS1  50.74759  9 7.790926e-08  2.317895
19    OTX2-AS1  49.82321  9 1.162909e-07  9.309310
20   SPRY4-AS1  46.93845  9 4.029622e-07  2.535104
write.csv(
  lncRNA_pseudotime_candidates,
  "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/Step10_lncRNA_pseudotime_candidates.csv",
  row.names = FALSE
)

Step 10.13: Visualize Top Pseudotime lncRNA

if (nrow(lncRNA_pseudotime_candidates) > 0) {
  
  top_lnc <- lncRNA_pseudotime_candidates$gene[1]
  
  expr_df <- FetchData(
    SO_traj,
    vars = c(top_lnc, "pseudotime_slingshot", "condition")
  ) %>%
    rownames_to_column("cell_id") %>%
    filter(!is.na(pseudotime_slingshot))
  
  ggplot(expr_df, aes(x = pseudotime_slingshot, y = .data[[top_lnc]], color = condition)) +
    geom_point(alpha = 0.25, size = 0.5) +
    geom_smooth(method = "loess", se = TRUE) +
    theme_classic() +
    labs(
      title = paste("Top pseudotime-associated lncRNA:", top_lnc),
      x = "Slingshot pseudotime",
      y = "Expression"
    )
  
} else {
  print("No lncRNA candidates found with current filtering.")
}
`geom_smooth()` using formula = 'y ~ x'

Step 10.14: Compare With Previous PD vs C lncRNA Candidates

This step connects Step 09 and Step 10.

The strongest lncRNAs are those that are:

PD vs C differentially expressed
and
dynamic along pseudotime
step09_lncRNA_file <- "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/Step09_lncRNA_candidates.csv"

step09_lncRNAs <- read_csv(step09_lncRNA_file)

overlap_lncRNAs <- intersect(
  step09_lncRNAs$gene,
  lncRNA_pseudotime_candidates$gene
)

overlap_lncRNAs

Step 10.15: Save Object With Pseudotime

save_path <- "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/SO_10_NeuronPseudotime.RDS"

saveRDS(SO_traj, file = save_path)

print("Step 10 complete: neuron pseudotime object saved successfully.")
[1] "Step 10 complete: neuron pseudotime object saved successfully."

Interpretation

This analysis converts the current neuron Seurat object into a pseudotime framework.

The most important validation is not whether Slingshot produces a curve.
The important question is whether the curve has biological meaning.

A useful PD damage trajectory should show:

low pseudotime:
control-enriched, low stress, healthy-like neurons

middle pseudotime:
early stress response, early lncRNA changes

high pseudotime:
PD-enriched, high damage score, apoptosis/stress gene activation

If this pattern is observed, the pseudotime can be used to search for early lncRNA driver candidates.