Step 09: Neuron Deep Dive & lncRNA Discovery

Setup: Environment and Data

In this module, we subset the neuron populations (Excitatory and Inhibitory) from our clean, globally annotated dataset. Since neurons are highly heterogeneous and selectively vulnerable in PD, we must re-cluster them to reveal their distinct subtypes before performing Differential Expression (DE) analysis.

library(Seurat)
library(tidyverse)
library(patchwork)
library(MAST)

# 1. Step 08で検証済みの綺麗なデータをロード
import_path <- "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/SO_08_Validated.RDS"
SO <- readRDS(import_path)

Step 9.1: Subsetting and Re-clustering Neurons

We extract only the neurons and recalculate their specific principal components (PCA). The genes that separate neurons from glia are no longer helpful; we need the genes that separate different types of neurons.

#| label: subset-recluster
#| message: false
#| warning: false
#| fig-width: 12
#| fig-height: 5

# 1. ニューロンだけを抽出
target_neurons <- c("Excitatory Neuron", "Inhibitory Neuron")
SO_neuron <- subset(SO, idents = target_neurons)

# 2. ニューロン特有の変動遺伝子を再計算し、PCAとUMAPをやり直す
SO_neuron <- FindVariableFeatures(SO_neuron, selection.method = "vst", nfeatures = 2000)

Finding variable features for layer counts

SO_neuron <- ScaleData(SO_neuron)

Centering and scaling data matrix

Warning: Different features in new layer data than already exists for
scale.data

SO_neuron <- RunPCA(SO_neuron, npcs = 50, verbose = FALSE)

ElbowPlot(SO_neuron, ndims = 50) + labs(title = "Elbow Plot: Choosing dims = 1:40")

# 解像度(resolution)は0.3〜0.5程度で、大まかなサブタイプを見つける
SO_neuron <- FindNeighbors(SO_neuron, dims = 1:40, verbose = FALSE)
SO_neuron <- FindClusters(SO_neuron, resolution = 0.3, verbose = FALSE) 
SO_neuron <- RunUMAP(SO_neuron, dims = 1:40, verbose = FALSE)

Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session

# 3. 視覚化：新しいクラスタと、PD/Cの分布
p1 <- DimPlot(SO_neuron, reduction = "umap", label = TRUE, pt.size = 0.5) + 
  labs(title = "Neuron Subclusters") + NoLegend()

p2 <- DimPlot(SO_neuron, group.by = "condition", pt.size = 0.5,
              cols = c("C" = "dodgerblue", "PD" = "firebrick")) +
  labs(title = "Condition Distribution")

# 4. Marker validation plot
p3 <- FeaturePlot(SO_neuron, features = c("SLC17A7", "GAD1", "GAD2"), ncol = 3, pt.size = 0.5)

(p1 | p2)

p3

Step 9.2: Differential Expression (PD vs C)

Now that we have isolated the neurons, we perform a global DE analysis comparing PD versus Normal cells within this neuronal subset. We use MAST, which handles single-cell zero-inflation well.

# 比較軸を 'condition' (PD vs C) に設定
Idents(SO_neuron) <- "condition"

# MASTを使ってDE解析 (lncRNAは発現量が低いことが多いので、logfc.thresholdを少し下げる)
neuron_degs <- FindMarkers(SO_neuron, 
                           ident.1 = "PD", 
                           ident.2 = "C", 
                           test.use = "MAST",
                           min.pct = 0.1,
                           logfc.threshold = 0.1)


Done!

Combining coefficients and standard errors

Calculating log-fold changes

Calculating likelihood ratio tests

Refitting on reduced model...


Done!

# 行名（遺伝子名）を列に変換
neuron_degs <- neuron_degs %>% rownames_to_column(var = "gene")

Step 9.3: The lncRNA Sniper

We use regular expressions (grepl) to filter the DE results for gene symbols matching standard lncRNA nomenclature (LINC, AC, AL, etc.).

# lncRNAの命名規則に合致するものを抽出
lncRNA_candidates <- neuron_degs %>%
  filter(grepl("^LINC|^AC[0-9]|^AL[0-9]|^AF[0-9]|-", gene)) %>%
  filter(p_val_adj < 0.05) %>% 
  arrange(desc(abs(avg_log2FC))) 

# トップ10のlncRNA候補を表示
print("Top 10 lncRNA candidates in ALL Neurons:")

[1] "Top 10 lncRNA candidates in ALL Neurons:"

head(lncRNA_candidates, 10)

        gene        p_val avg_log2FC pct.1 pct.2    p_val_adj
1   OTX2-AS1 1.053499e-18   4.658860 0.119 0.010 5.032883e-14
2  LINC02691 3.899785e-20   3.832394 0.133 0.011 1.863045e-15
3   PAX8-AS1 8.610366e-17  -3.770955 0.014 0.146 4.113430e-12
4  LINC03000 2.739142e-20   3.730858 0.194 0.045 1.308570e-15
5   CPVL-AS2 1.598159e-08   1.957741 0.126 0.038 7.634887e-04
6   CHL1-AS2 1.647690e-08   1.917268 0.119 0.033 7.871508e-04
7  LINC02223 1.038637e-16   1.757036 0.211 0.054 4.961880e-12
8  TSHZ3-AS1 4.297419e-07  -1.687591 0.049 0.141 2.053006e-02
9   NOVA1-DT 3.789593e-08   1.668103 0.173 0.073 1.810402e-03
10 LINC02798 1.673963e-11   1.644717 0.358 0.207 7.997025e-07

Step 9.4: Visualizing the Top Candidate

We visualize our top hit across the newly defined neuronal subclusters to see its specificity.

if(nrow(lncRNA_candidates) > 0) {
  top_lncRNA <- lncRNA_candidates$gene[1]
  
  # サブクラスタごとに発現量を比較
  VlnPlot(SO_neuron, features = top_lncRNA, 
          group.by = "seurat_clusters", split.by = "condition",
          pt.size = 0) +
    labs(title = paste("Expression of", top_lncRNA, "across Neuron Subtypes"))
} else {
  print("No significant lncRNAs found with the current thresholds.")
}

The default behaviour of split.by has changed.
Separate violin plots are now plotted side-by-side.
To restore the old behaviour of a single split violin,
set split.plot = TRUE.
      
This message will be shown once per session.

Final Save

save_path <- "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_2026/scripts/SO_09_NeuronSub.RDS"
saveRDS(SO_neuron, file = save_path)