Background

In the study “3D promoter architecture re-organization during iPSC-derived neuronal cell differentiation implicates target genes for neurodevelopmental disorders,” the authors investigate the epigenetic and structural changes underlying gene expression during the transition from neuronal progenitor cells (NPCs) to neurons.

They report that promoter accessibility (prOCRs) generally correlates with gene activation: for instance, neuron-specific genes such as SYN1, STMN4, RBFOX3, and NEFL showed increased promoter accessibility alongside higher expression in neurons. However, this trend was not universally true. A substantial number of downregulated genes also retained open promoter regions, suggesting that promoter accessibility alone does not fully account for transcriptional dynamics during differentiation.

To explore the regulatory complexity further, the authors integrated ATAC-seq, RNA-seq, and high-resolution promoter-focused Capture-C data. Their analysis revealed that long-range interactions between promoters and distal cis-regulatory elements (cREs)—such as enhancers or silencers—play a critical role in defining cell-type-specific gene expression. For example, downregulation of PAX6 in neurons coincided with a loss of enhancer contacts and a gain in interactions with repressive chromatin, despite unchanged promoter accessibility. Conversely, NEFL gained enhancer contacts in neurons, aligning with its increased expression.

These findings highlight that 3D chromatin architecture and distal regulatory element engagement provide a more comprehensive explanation for gene expression changes than promoter accessibility alone. Therefore, understanding transcriptional regulation during neurodevelopment requires integrating chromatin accessibility, spatial genome organization, and epigenomic context.

Hypothesis

Q1:

Does the formation of more enhancer-promoter loops lead to increased cell-type-specific gene expression following neuronal differentiation?

Q2:

Can integrating GWAS data with eQTL and enhancer (eRE) maps enhance our ability to interpret gene expression changes in a cell-type-specific context?

Loading Dataset

Loading CSV files from ScienceDirect Paper

Although several supplementary datasets were provided with the paper, we focused on Supplementary Tables 8 and 9, which detail genome-wide interactions between genes and candidate regulatory elements (cREs) in NPCs and Neurons, respectively, as they were most relevant to our hypothesis.

NPC <- read.csv("/home/govind/Downloads/ScienceDirect_files_11Jun2025_14-00-58.641/1-s2.0-S0301008221000149-mmc17.csv")
Neurons <- read.csv("/home/govind/Downloads/ScienceDirect_files_11Jun2025_14-00-58.641/1-s2.0-S0301008221000149-mmc18.csv")

Exploring Datasets

head(NPC)
##   ocr_chr ocr_start ocr_end   gene_name            gene_id chicago_score
## 1    chr1    839597  840453       ACAP3 ENSG00000131584.14           9.5
## 2    chr1    839597  840453    C1orf170  ENSG00000187642.5           6.6
## 3    chr1    839597  840453      KLHL17  ENSG00000187961.9           8.3
## 4    chr1    839597  840453     PLEKHN1  ENSG00000187583.6           8.3
## 5    chr1    839597  840453       PUSL1  ENSG00000169972.7           9.5
## 6    chr1    839597  840453 RP11-54O7.1  ENSG00000230699.2        6.1|11
head(Neurons)
##   ocr_chr ocr_start ocr_end      gene_name            gene_id chicago_score
## 1    chr1    839597  840453         KLHL17  ENSG00000187961.9            11
## 2    chr1    839597  840453        PLEKHN1  ENSG00000187583.6            11
## 3    chr1    839597  840453    RP11-54O7.1  ENSG00000230699.2           6.5
## 4    chr1    839597  840453 TCONS_00001368     TCONS_00001368           6.5
## 5    chr1    846026  847146         KLHL17  ENSG00000187961.9           6.4
## 6    chr1    846026  847146          MXRA8 ENSG00000162576.12           5.3

Mutating Dataset to add cRE IDs to give each enhancer a unique identity

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
NPC<- NPC %>% mutate(cRE_ID = paste(ocr_chr, ocr_start, ocr_end, sep = "_"))
Neurons<- Neurons %>% mutate(cRE_ID = paste(ocr_chr, ocr_start, ocr_end, sep = "_"))

Grouping them by gene names and counting unique interactions

# For example: count cREs per gene in NPCs
library(dplyr)

npc_loops_per_gene <- NPC %>%
  group_by(gene_name) %>%
  summarise(Num_cREs = n_distinct(cRE_ID))

# Same for neurons
neuron_loops_per_gene <- Neurons %>%
  group_by(gene_name) %>%
  summarise(Num_cREs = n_distinct(cRE_ID))

Finding top genes

# Top 10 genes with most loops
top_npc_genes <- npc_loops_per_gene %>% arrange(desc(Num_cREs)) %>% head(10)
top_neuron_genes <- neuron_loops_per_gene %>% arrange(desc(Num_cREs)) %>% head(10)

Plotting Top Genes

library(ggplot2)

ggplot(top_neuron_genes, aes(x = reorder(gene_name, Num_cREs), y = Num_cREs)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Top Genes with Most Enhancer Loops in Neurons",
       x = "Gene", y = "Number of Enhancer Loops")

library(ggplot2)

ggplot(top_npc_genes, aes(x = reorder(gene_name, Num_cREs), y = Num_cREs)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Top Genes with Most Enhancer Loops in NPCs",
       x = "Gene", y = "Number of Enhancer Loops")

Gene Ontology Analysis

Preparing my gene list (e.g., top 100 neuron genes)

top_100_neuron_genes <- neuron_loops_per_gene %>% 
  arrange(desc(Num_cREs)) %>% 
  slice(1:100)  # top 100 genes

top_100_npc_genes <- npc_loops_per_gene %>% 
  arrange(desc(Num_cREs)) %>% 
  slice(1:100)  # top 100 genes

For David

# Save Gene Symbols 
write.table(top_100_npc_genes$gene_name,
            file = "top_100_npc_genes_symbol.txt",
            row.names = FALSE, col.names = FALSE, quote = FALSE)

write.table(top_100_neuron_genes$gene_name,
            file = "top_100_neuron_genes_symbol.txt",
            row.names = FALSE, col.names = FALSE, quote = FALSE)