Step 12: Astrocyte Pseudotime lncRNA Discovery

Identifying lncRNAs associated with PD-enriched astrocyte reactivity trajectory

Setup: Environment and Data

In Step 11, we identified Slingshot Lineage2 as the main PD-enriched astrocyte reactivity trajectory.

Lineage2 was selected because it showed:

positive correlation with reactive astrocyte score
positive correlation with stress score
positive correlation with ECM remodeling score
PD cells shifted toward later pseudotime

In this step, we identify genes and lncRNAs that dynamically change along this selected astrocyte pseudotime trajectory.

library(Seurat)
library(tidyverse)
library(patchwork)
library(tradeSeq)
library(MAST)

project_dir <- "/Users/yoshimurasouhei/Downloads/010_school/4年生/bioinfomaticsリサーチクラークシップ/PD_lncRNA_Trajectory_2026"

scripts_dir <- file.path(project_dir, "scripts")
results_dir <- file.path(project_dir, "results")

dir.create(scripts_dir, showWarnings = FALSE, recursive = TRUE)
dir.create(results_dir, showWarnings = FALSE, recursive = TRUE)

# Load Step 11 object
import_path <- file.path(
  scripts_dir,
  "SO_11_ReactiveAstrocytePseudotime_Lineage2.RDS"
)

SO_astro <- readRDS(import_path)

SO_astro

An object of class Seurat 
47773 features across 8952 samples within 1 assay 
Active assay: RNA (47773 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Step 12.1: Confirm Astrocyte Pseudotime Object

Before running downstream analysis, we confirm that the object contains the selected astrocyte pseudotime column.

"astro_pseudotime" %in% colnames(SO_astro@meta.data)

[1] TRUE

summary(SO_astro$astro_pseudotime)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.     NAs 
  0.000   3.634  10.014  12.236  21.145  29.915    3754

table(SO_astro$condition)


   C   PD 
5285 3667

table(SO_astro$seurat_clusters, SO_astro$condition)

   
       C   PD
  0 2115    6
  1 1015  619
  2 1131    2
  3  558  458
  4    6  904
  5  220  675
  6    6  754
  7    0  243
  8  217    3
  9   17    3

FeaturePlot(
  SO_astro,
  features = "astro_pseudotime",
  reduction = "umap",
  pt.size = 0.5
) +
  labs(title = "Selected Astrocyte Pseudotime: Lineage2")

Step 12.2: Prepare tradeSeq Input for Selected Astrocyte Lineage

We only use cells assigned to Lineage2, meaning cells with non-NA astro_pseudotime.

For the first run, we use 500 variable genes for speed and stability.
After confirming that the pipeline works, this can be increased to 1000 or 2000 genes.

counts_astro <- GetAssayData(
  SO_astro,
  assay = "RNA",
  layer = "counts"
)

# Start with 500 genes for safety
genes_use <- head(VariableFeatures(SO_astro), 500)
genes_use <- intersect(genes_use, rownames(counts_astro))

# Keep only cells assigned to the selected pseudotime lineage
cells_keep <- !is.na(SO_astro$astro_pseudotime)

sum(cells_keep)

[1] 5198

counts_use <- counts_astro[genes_use, cells_keep]

# tradeSeq expects a pseudotime matrix
pt_use <- matrix(
  SO_astro$astro_pseudotime[cells_keep],
  ncol = 1
)

# Single selected lineage, so all kept cells have weight 1
cw_use <- matrix(
  1,
  nrow = sum(cells_keep),
  ncol = 1
)

rownames(pt_use) <- colnames(counts_use)
rownames(cw_use) <- colnames(counts_use)

colnames(pt_use) <- "Lineage2"
colnames(cw_use) <- "Lineage2"

dim(counts_use)

[1]  500 5198

dim(pt_use)

[1] 5198    1

dim(cw_use)

[1] 5198    1

Step 12.3: Fit Gene Expression Dynamics Along Astrocyte Pseudotime

We use tradeSeq::fitGAM() to model gene expression changes along the selected astrocyte reactivity pseudotime.

set.seed(123)

astro_gam_fit <- fitGAM(
  counts = counts_use,
  pseudotime = pt_use,
  cellWeights = cw_use,
  nknots = 4,
  verbose = TRUE
)

Step 12.4: Identify Pseudotime-Associated Genes

The association test identifies genes whose expression significantly changes along astrocyte pseudotime.

astro_assoc_res <- associationTest(astro_gam_fit)

astro_assoc_res <- astro_assoc_res %>%
  rownames_to_column("gene") %>%
  arrange(pvalue)

head(astro_assoc_res, 30)

              gene   waldStat df pvalue meanLogFC
1           CHI3L1  982.93738  3      0 3.9896910
2  ENSG00000259846  152.15028  3      0 2.6383251
3           OPALIN  187.72242  3      0 2.8543767
4             SGCZ   80.73986  3      0 2.7456176
5           PDE10A  291.36466  3      0 1.9574506
6         C11orf96  108.06547  3      0 0.8421276
7             CPS1   89.44401  3      0 1.0394880
8            GRID2  674.78528  3      0 4.9875663
9            SLIT2  110.54728  3      0 1.2020808
10         SLC14A1 1695.30620  3      0 2.2937542
11 ENSG00000303264  122.97473  3      0 2.1944237
12           DPP10 3875.79850  3      0 3.5109410
13       LINC01608  869.04916  3      0 6.7669663
14          IGFBP5  368.71115  3      0 3.3703898
15           HSPH1  307.34451  3      0 0.8078828
16       LINC01094  693.66493  3      0 4.1224676
17       LINC01099  299.48556  3      0 2.4926251
18       LINC00499  226.30833  3      0 3.2894351
19            ANO4 1690.29392  3      0 3.2095007
20 ENSG00000304278   88.63230  3      0 3.4200423
21            PDK4 1190.09688  3      0 1.5698652
22         SAMMSON  180.84965  3      0 1.6171738
23         ZSCAN31  177.52026  3      0 2.4260423
24          SLC1A2 2448.19668  3      0 2.6549316
25            EMP1  175.84857  3      0 3.2475279
26         ANGPTL4 1756.55494  3      0 1.4067795
27 ENSG00000286271  338.81532  3      0 3.0099288
28             VIM  569.56500  3      0 1.8550658
29           LAMA2 1907.65809  3      0 2.8224074
30         ITPRID1  139.94461  3      0 0.7609342

write.csv(
  astro_assoc_res,
  file.path(results_dir, "Step12_Lineage2_astro_pseudotime_genes.csv"),
  row.names = FALSE
)

Step 12.5: Define Helper Functions

lncRNA_regex <- "^LINC|^AC[0-9]|^AL[0-9]|^AF[0-9]|-AS[0-9]*$"

plot_pseudotime_genes <- function(object, genes, output_name, width = 12, height = 8) {
  genes <- intersect(genes, rownames(object))
  
  plot_list <- lapply(genes, function(gene_i) {
    expr_df <- FetchData(
      object,
      vars = c(gene_i, "astro_pseudotime", "condition")
    ) %>%
      rownames_to_column("cell_id") %>%
      filter(!is.na(astro_pseudotime))
    
    ggplot(
      expr_df,
      aes(
        x = astro_pseudotime,
        y = .data[[gene_i]],
        color = condition
      )
    ) +
      geom_point(alpha = 0.2, size = 0.4) +
      geom_smooth(method = "loess", se = TRUE) +
      theme_classic() +
      labs(
        title = gene_i,
        x = "Astrocyte pseudotime",
        y = "Expression"
      )
  })
  
  p <- wrap_plots(plot_list, ncol = 2)
  
  print(p)
  
  ggsave(
    file.path(results_dir, output_name),
    plot = p,
    width = width,
    height = height,
    dpi = 300
  )
  
  invisible(p)
}

plot_feature_genes <- function(object, genes, output_name, width = 12, height = 8) {
  genes <- intersect(genes, rownames(object))
  
  p <- FeaturePlot(
    object,
    features = genes,
    reduction = "umap",
    ncol = 2,
    pt.size = 0.3
  )
  
  print(p)
  
  ggsave(
    file.path(results_dir, output_name),
    plot = p,
    width = width,
    height = height,
    dpi = 300
  )
  
  invisible(p)
}

plot_vln_genes <- function(object, genes, output_name, width = 12, height = 6) {
  genes <- intersect(genes, rownames(object))
  
  p <- VlnPlot(
    object,
    features = genes,
    group.by = "condition",
    pt.size = 0,
    ncol = 2
  )
  
  print(p)
  
  ggsave(
    file.path(results_dir, output_name),
    plot = p,
    width = width,
    height = height,
    dpi = 300
  )
  
  invisible(p)
}

Step 12.6: Extract Pseudotime-Associated lncRNAs

astro_lncRNA_candidates <- astro_assoc_res %>%
  filter(grepl(lncRNA_regex, gene)) %>%
  arrange(pvalue)

head(astro_lncRNA_candidates, 30)

          gene   waldStat df pvalue meanLogFC
1    LINC01608  869.04916  3      0 6.7669663
2    LINC01094  693.66493  3      0 4.1224676
3    LINC01099  299.48556  3      0 2.4926251
4    LINC00499  226.30833  3      0 3.2894351
5    LINC02343  615.16034  3      0 4.2127601
6    LINC01505 3306.52858  3      0 3.0582131
7    LINC02125   86.24007  3      0 1.8540568
8    LINC02232  308.28349  3      0 2.7124155
9   DNAH17-AS1  278.15524  3      0 4.0697675
10   LINC01088 2808.97934  3      0 1.6106969
11    AQP4-AS1  969.05117  3      0 2.4594468
12   LINC02328  188.96578  3      0 0.9968300
13   LINC01515  219.29259  3      0 3.5141613
14   LINC02511  347.94982  3      0 2.6197823
15   PACRG-AS3  164.52903  3      0 1.7893871
16   LINC02882 2441.14670  3      0 2.2161777
17   LINC01098  219.28226  3      0 1.9480600
18   LINC00299 1147.81186  3      0 3.0594976
19   LINC01170 1619.59939  3      0 2.3062977
20   LINC01060  276.10944  3      0 4.9829177
21   LINC02814  119.63894  3      0 1.8109815
22 ADAMTS9-AS2  679.32206  3      0 3.1004661
23   BAALC-AS1  181.84153  3      0 0.9767224
24    RORB-AS1   86.53763  3      0 0.9054334
25 SLC44A3-AS1  759.08045  3      0 1.8314150
26   LINC01414  107.25916  3      0 1.6987073
27   LINC01122  247.45172  3      0 1.0827693
28    OBI1-AS1 2263.01118  3      0 3.1037328
29   LINC00472  733.52675  3      0 1.2157159
30   LINC01266  118.81385  3      0 1.1746458

write.csv(
  astro_lncRNA_candidates,
  file.path(results_dir, "Step12_Lineage2_astro_lncRNA_pseudotime_candidates.csv"),
  row.names = FALSE
)

Step 12.7: Validate Trajectory Using Known Reactive Astrocyte Genes

known_reactive_genes <- c(
  "CHI3L1", "TNC", "HSPB1", "VIM", "GFAP",
  "C3", "SERPINA3", "LCN2", "SLC1A2", "AQP4"
)

plot_pseudotime_genes(
  object = SO_astro,
  genes = known_reactive_genes,
  output_name = "Step12_known_reactive_genes_pseudotime.png",
  width = 12,
  height = 8
)

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

Step 12.8: PD vs Control Differential Expression Within Astrocytes

Idents(SO_astro) <- "condition"

astro_PD_vs_C <- FindMarkers(
  SO_astro,
  ident.1 = "PD",
  ident.2 = "C",
  test.use = "MAST",
  min.pct = 0.1,
  logfc.threshold = 0.1
)

astro_PD_vs_C <- astro_PD_vs_C %>%
  rownames_to_column("gene") %>%
  arrange(p_val_adj)

head(astro_PD_vs_C, 30)

              gene         p_val avg_log2FC pct.1 pct.2     p_val_adj
1           MT-ND3  0.000000e+00  1.2823463 0.920 0.815  0.000000e+00
2           MALAT1 5.282373e-317 -0.7290655 0.999 0.999 2.523548e-312
3            NEAT1 4.537160e-306  1.3179548 0.981 0.968 2.167538e-301
4            PWRN1 4.749422e-274 -3.1430350 0.054 0.347 2.268942e-269
5           CHI3L1 4.490183e-268  3.8366184 0.315 0.062 2.145095e-263
6            PRKCA 8.787588e-262  1.2444778 0.872 0.718 4.198095e-257
7            HEPN1 3.880468e-258 -1.7806285 0.257 0.555 1.853816e-253
8            MACF1 7.945422e-254  1.3632633 0.940 0.907 3.795767e-249
9           SAMD4A 1.516207e-251  1.7745594 0.724 0.531 7.243377e-247
10          DBNDD2 8.740907e-244 -1.3884844 0.425 0.628 4.175794e-239
11            PLLP 2.003039e-235 -1.7879997 0.265 0.498 9.569119e-231
12           STMN4 2.284882e-228 -1.4136064 0.365 0.593 1.091557e-223
13          CTNNA3 5.550843e-223 -0.7614270 0.506 0.817 2.651804e-218
14           S100B 1.652391e-220 -1.2437537 0.587 0.795 7.893969e-216
15           HSPH1 7.578305e-217  2.9738275 0.437 0.184 3.620383e-212
16            JUND 2.488657e-213 -1.2486629 0.372 0.650 1.188906e-208
17            FTH1 4.601239e-192 -1.1109748 0.675 0.836 2.198150e-187
18           PDE1A 5.251453e-189 -1.6952818 0.244 0.490 2.508777e-184
19          PLAAT3 9.001521e-188 -1.0745548 0.467 0.676 4.300297e-183
20           H3-3B 9.807605e-185 -1.2038071 0.422 0.658 4.685387e-180
21         TSC22D4 1.471226e-183 -1.0480836 0.452 0.673 7.028487e-179
22          SORBS1 1.248329e-178  1.4263674 0.683 0.420 5.963641e-174
23           STMN1 3.540918e-176 -1.1381124 0.449 0.647 1.691603e-171
24 ENSG00000228793 3.998695e-175 -1.4160049 0.258 0.521 1.910297e-170
25           PRRX1 2.301933e-174  2.2814079 0.332 0.110 1.099702e-169
26            XIST 3.929036e-173  1.1839861 0.629 0.533 1.877018e-168
27            NAV2 4.665551e-172  1.2823549 0.826 0.704 2.228874e-167
28        SERPINH1 1.035282e-171  4.3311943 0.184 0.022 4.945854e-167
29             DST 4.368316e-170  0.7615371 0.947 0.963 2.086876e-165
30            MSRA 6.707360e-168  1.4207225 0.603 0.395 3.204307e-163

write.csv(
  astro_PD_vs_C,
  file.path(results_dir, "Step12_astro_PD_vs_C_DEGs.csv"),
  row.names = FALSE
)

Step 12.9: Extract PD-Associated Astrocyte lncRNAs

astro_PD_lncRNAs <- astro_PD_vs_C %>%
  filter(grepl(lncRNA_regex, gene)) %>%
  filter(p_val_adj < 0.05) %>%
  arrange(p_val_adj)

head(astro_PD_lncRNAs, 30)

          gene         p_val avg_log2FC pct.1 pct.2     p_val_adj
1  SLC44A3-AS1 1.126642e-146  2.4622133 0.285 0.102 5.382306e-142
2    LINC00844 3.863171e-112 -1.2213540 0.236 0.444 1.845553e-107
3    LINC-PINT 3.591696e-106  0.7534983 0.865 0.829 1.715861e-101
4    PTCHD1-AS  7.070613e-88  1.1357397 0.452 0.261  3.377844e-83
5    LINC01470  1.962534e-83 -0.7041811 0.221 0.413  9.375612e-79
6     HOXB-AS1  2.399542e-80 -2.7982213 0.031 0.134  1.146333e-75
7    LINC01505  3.537316e-80 -1.0777578 0.307 0.480  1.689882e-75
8    LINC01515  1.433890e-79  0.9667813 0.170 0.059  6.850122e-75
9    LINC03048  1.848749e-72 -1.6287955 0.098 0.227  8.832030e-68
10   LINC01170  2.140525e-70 -0.9802111 0.248 0.421  1.022593e-65
11   LINC00609  4.122469e-70 -0.5969034 0.483 0.641  1.969427e-65
12  ZBTB47-AS1  6.724439e-70 -1.0224971 0.244 0.414  3.212466e-65
13  TMEM44-AS1  1.677057e-66 -1.5452032 0.058 0.177  8.011802e-62
14   LINC03051  3.870773e-66  0.9243851 0.493 0.328  1.849185e-61
15   LINC03122  6.859944e-64 -0.7440221 0.287 0.461  3.277201e-59
16   PCDH9-AS2  3.771488e-63 -0.3124045 0.289 0.453  1.801753e-58
17 ADAMTS9-AS2  3.248371e-62  1.1485216 0.347 0.197  1.551844e-57
18    DLG2-AS2  8.162215e-61 -0.9759822 0.194 0.348  3.899335e-56
19   BAALC-AS1  3.200840e-60  2.0182523 0.176 0.065  1.529137e-55
20  PARD6G-AS1  1.993784e-59  2.9136746 0.100 0.026  9.524902e-55
21   LINC02821  1.507991e-58 -0.9454386 0.189 0.336  7.204126e-54
22  SEMA6A-AS1  2.404788e-58 -0.9760305 0.211 0.357  1.148840e-53
23   PRKCQ-AS1  1.184379e-54 -1.5043186 0.065 0.172  5.658136e-50
24   LINC00472  1.054694e-53  1.2016452 0.315 0.193  5.038591e-49
25   LINC01088  4.740920e-53  0.6842325 0.497 0.334  2.264880e-48
26   LMCD1-AS1  4.515074e-51 -0.2977500 0.660 0.803  2.156986e-46
27   LINC02882  1.561107e-49 -0.9978768 0.202 0.335  7.457875e-45
28   GNA14-AS1  1.204784e-47 -1.3094606 0.053 0.148  5.755617e-43
29   LINC01877  1.325671e-46 -0.8506140 0.199 0.335  6.333126e-42
30   LINC02256  8.182473e-41 -0.7718557 0.140 0.256  3.909013e-36

write.csv(
  astro_PD_lncRNAs,
  file.path(results_dir, "Step12_astro_PD_vs_C_lncRNAs.csv"),
  row.names = FALSE
)

Step 12.10: Prioritize lncRNAs Associated With Both Pseudotime and PD Status

priority_astro_lncRNAs <- astro_lncRNA_candidates %>%
  inner_join(
    astro_PD_lncRNAs %>%
      select(
        gene,
        p_val_adj_PD_vs_C = p_val_adj,
        avg_log2FC_PD_vs_C = avg_log2FC,
        pct.1,
        pct.2
      ),
    by = "gene"
  ) %>%
  arrange(pvalue)

priority_astro_lncRNAs

          gene   waldStat df       pvalue meanLogFC p_val_adj_PD_vs_C
1    LINC01094  693.66493  3 0.000000e+00 4.1224676      2.908536e-02
2    LINC01505 3306.52858  3 0.000000e+00 3.0582131      1.689882e-75
3   DNAH17-AS1  278.15524  3 0.000000e+00 4.0697675      3.574158e-05
4    LINC01088 2808.97934  3 0.000000e+00 1.6106969      2.264880e-48
5     AQP4-AS1  969.05117  3 0.000000e+00 2.4594468      2.169236e-07
6    LINC01515  219.29259  3 0.000000e+00 3.5141613      6.850122e-75
7    LINC02511  347.94982  3 0.000000e+00 2.6197823      7.493495e-10
8    LINC02882 2441.14670  3 0.000000e+00 2.2161777      7.457875e-45
9    LINC00299 1147.81186  3 0.000000e+00 3.0594976      1.146469e-27
10   LINC01170 1619.59939  3 0.000000e+00 2.3062977      1.022593e-65
11 ADAMTS9-AS2  679.32206  3 0.000000e+00 3.1004661      1.551844e-57
12   BAALC-AS1  181.84153  3 0.000000e+00 0.9767224      1.529137e-55
13 SLC44A3-AS1  759.08045  3 0.000000e+00 1.8314150     5.382306e-142
14   LINC01122  247.45172  3 0.000000e+00 1.0827693      4.709572e-08
15   LINC00472  733.52675  3 0.000000e+00 1.2157159      5.038591e-49
16   LINC00836  262.37537  3 0.000000e+00 4.3277080      1.676091e-13
17   NR2F2-AS1  143.88055  3 0.000000e+00 1.7570634      5.176881e-27
18  KCNMB2-AS1  328.44289  3 0.000000e+00 0.9718464      6.041180e-08
19   LINC03051 1889.51822  3 0.000000e+00 1.8830094      1.849185e-61
20  CALCRL-AS1   61.76679  3 2.464695e-13 0.5209855      3.209818e-10
21    SGO1-AS1   27.56501  3 4.481561e-06 0.3022448      8.790388e-03
   avg_log2FC_PD_vs_C pct.1 pct.2
1           0.5000921 0.203 0.160
2          -1.0777578 0.307 0.480
3          -0.7365891 0.098 0.119
4           0.6842325 0.497 0.334
5           0.5003989 0.266 0.203
6           0.9667813 0.170 0.059
7          -0.9010823 0.083 0.131
8          -0.9978768 0.202 0.335
9           0.4948954 0.471 0.344
10         -0.9802111 0.248 0.421
11          1.1485216 0.347 0.197
12          2.0182523 0.176 0.065
13          2.4622133 0.285 0.102
14         -0.7551728 0.083 0.129
15          1.2016452 0.315 0.193
16          0.9431350 0.144 0.085
17          1.1063026 0.170 0.086
18          0.6042933 0.191 0.134
19          0.9243851 0.493 0.328
20         -0.9392724 0.074 0.122
21          0.4975993 0.108 0.074

write.csv(
  priority_astro_lncRNAs,
  file.path(results_dir, "Step12_priority_astro_lncRNA_candidates.csv"),
  row.names = FALSE
)

Step 12.11: Systematically Evaluate Selected lncRNA Candidates

Instead of relying only on p-value, we evaluate selected lncRNAs using biological direction.

selected_lncRNAs <- c(
  # PD-up / reactive-associated candidates
  "ADAMTS9-AS2",
  "SLC44A3-AS1",
  "LINC03051",
  "LINC00472",
  "LINC01088",
  "AQP4-AS1",
  
  # PD-down / homeostatic-loss candidates
  "LINC01505",
  "LINC02882",
  "LINC01170"
)

selected_lncRNAs <- intersect(selected_lncRNAs, rownames(SO_astro))

selected_lncRNA_eval <- lapply(selected_lncRNAs, function(gene_i) {
  
  expr_df <- FetchData(
    SO_astro,
    vars = c(
      gene_i,
      "astro_pseudotime",
      "condition",
      "Reactive_astro_score1",
      "Stress_score1",
      "ECM_score1"
    )
  ) %>%
    rownames_to_column("cell_id") %>%
    filter(!is.na(astro_pseudotime))
  
  tibble(
    gene = gene_i,
    cor_pseudotime = cor(expr_df[[gene_i]], expr_df$astro_pseudotime, method = "spearman", use = "complete.obs"),
    cor_reactive = cor(expr_df[[gene_i]], expr_df$Reactive_astro_score1, method = "spearman", use = "complete.obs"),
    cor_stress = cor(expr_df[[gene_i]], expr_df$Stress_score1, method = "spearman", use = "complete.obs"),
    cor_ecm = cor(expr_df[[gene_i]], expr_df$ECM_score1, method = "spearman", use = "complete.obs"),
    mean_expr_PD = mean(expr_df[[gene_i]][expr_df$condition == "PD"], na.rm = TRUE),
    mean_expr_C = mean(expr_df[[gene_i]][expr_df$condition == "C"], na.rm = TRUE),
    delta_PD_minus_C = mean_expr_PD - mean_expr_C,
    percent_expr_PD = mean(expr_df[[gene_i]][expr_df$condition == "PD"] > 0, na.rm = TRUE),
    percent_expr_C = mean(expr_df[[gene_i]][expr_df$condition == "C"] > 0, na.rm = TRUE)
  )
}) %>%
  bind_rows() %>%
  left_join(
    priority_astro_lncRNAs %>%
      select(
        gene,
        pseudotime_pvalue = pvalue,
        pseudotime_meanLogFC = meanLogFC,
        p_val_adj_PD_vs_C,
        avg_log2FC_PD_vs_C,
        pct.1,
        pct.2
      ),
    by = "gene"
  ) %>%
  mutate(
    direction = case_when(
      avg_log2FC_PD_vs_C > 0 ~ "PD_up",
      avg_log2FC_PD_vs_C < 0 ~ "PD_down",
      TRUE ~ "neutral"
    ),
    biological_score =
      (cor_pseudotime > 0) +
      (cor_reactive > 0) +
      (cor_stress > 0) +
      (cor_ecm > 0) +
      (avg_log2FC_PD_vs_C > 0) +
      (delta_PD_minus_C > 0)
  ) %>%
  arrange(
    desc(biological_score),
    desc(avg_log2FC_PD_vs_C),
    desc(cor_reactive),
    desc(cor_stress),
    desc(cor_ecm)
  )

selected_lncRNA_eval

# A tibble: 9 × 18
  gene   cor_pseudotime cor_reactive cor_stress cor_ecm mean_expr_PD mean_expr_C
  <chr>           <dbl>        <dbl>      <dbl>   <dbl>        <dbl>       <dbl>
1 SLC44…          0.328        0.378      0.262   0.384        0.749       0.124
2 LINC0…          0.344        0.298      0.234   0.301        0.229       0.214
3 ADAMT…          0.515        0.480      0.361   0.478        0.565       0.309
4 LINC0…          0.557        0.532      0.393   0.530        0.955       0.496
5 AQP4-…          0.494        0.451      0.339   0.448        0.489       0.304
6 LINC0…          0.584        0.473      0.375   0.470        0.520       0.567
7 LINC0…         -0.533       -0.559     -0.429  -0.528        0.345       0.827
8 LINC0…         -0.479       -0.477     -0.395  -0.428        0.258       0.641
9 LINC0…         -0.661       -0.618     -0.484  -0.573        0.522       1.19 
# ℹ 11 more variables: delta_PD_minus_C <dbl>, percent_expr_PD <dbl>,
#   percent_expr_C <dbl>, pseudotime_pvalue <dbl>, pseudotime_meanLogFC <dbl>,
#   p_val_adj_PD_vs_C <dbl>, avg_log2FC_PD_vs_C <dbl>, pct.1 <dbl>,
#   pct.2 <dbl>, direction <chr>, biological_score <int>

write.csv(
  selected_lncRNA_eval,
  file.path(results_dir, "Step12_selected_lncRNA_systematic_evaluation.csv"),
  row.names = FALSE
)

Step 12.12: Visualize Selected lncRNA Candidates

plot_pseudotime_genes(
  object = SO_astro,
  genes = selected_lncRNAs,
  output_name = "Step12_selected_lncRNAs_pseudotime.png",
  width = 12,
  height = 10
)

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

plot_feature_genes(
  object = SO_astro,
  genes = selected_lncRNAs,
  output_name = "Step12_selected_lncRNAs_FeaturePlot.png",
  width = 12,
  height = 8
)

plot_vln_genes(
  object = SO_astro,
  genes = selected_lncRNAs,
  output_name = "Step12_selected_lncRNAs_PD_vs_C_VlnPlot.png",
  width = 12,
  height = 6
)

Step 12.13: Save Final Astrocyte lncRNA Discovery Object

save_path <- file.path(
  scripts_dir,
  "SO_12_Astrocyte_lncRNA_Discovery.RDS"
)

saveRDS(SO_astro, file = save_path)

print("Step 12 complete: astrocyte pseudotime lncRNA discovery finished.")

[1] "Step 12 complete: astrocyte pseudotime lncRNA discovery finished."

Interpretation

The results suggest that the selected astrocyte trajectory captures a biologically meaningful transition from baseline astrocytes toward reactive/stressed astrocyte states. Known reactive astrocyte genes such as CHI3L1, TNC, HSPB1, VIM, GFAP, C3, SLC1A2, and AQP4 changed along pseudotime, supporting the interpretation that this lineage reflects astrocyte reactivity rather than a purely technical or clustering artifact.

Among the lncRNA candidates, two major groups were observed.

PD-up / reactive-associated lncRNAs:
ADAMTS9-AS2, SLC44A3-AS1, LINC03051, LINC00472, LINC01088, AQP4-AS1

PD-down / homeostatic-loss lncRNAs:
LINC01505, LINC02882, LINC01170

Among the PD-up candidates, ADAMTS9-AS2 showed the clearest pattern. Its expression increased toward late astrocyte pseudotime and was higher in PD astrocytes. Because ADAMTS-related genes are often connected to extracellular matrix remodeling, ADAMTS9-AS2 may be related to the ECM-remodeling aspect of reactive astrocyte activation.

SLC44A3-AS1 and LINC03051 also showed strong PD-associated expression, especially around the reactive transition phase of the trajectory. These genes may represent lncRNAs involved in intermediate reactive astrocyte states rather than only the final late-reactive state.

In contrast, LINC01505, LINC02882, and LINC01170 decreased along astrocyte pseudotime and were lower in PD astrocytes. These lncRNAs may represent homeostatic astrocyte-associated transcripts that are lost during PD-related reactive transformation.

Overall, this step suggests that PD-associated astrocyte reactivity is accompanied by both gain of reactive-state lncRNAs and loss of homeostatic astrocyte lncRNAs. The strongest candidates from this analysis are:

Reactive / PD-up candidates:
ADAMTS9-AS2, SLC44A3-AS1, LINC03051

Homeostatic-loss / PD-down candidates:
LINC01505, LINC02882, LINC01170

These candidates should not yet be interpreted as causal drivers. They should be treated as prioritized lncRNA candidates for downstream validation using gene biotype annotation, independent PD datasets, regulatory interaction analysis, and functional experiments.