suppressMessages(library(igraph))
suppressMessages(library(ggplot2))
suppressMessages(library(dplyr))
suppressMessages(library(RColorBrewer))
suppressMessages(library(huge))
suppressMessages(library(xlsx))
set.seed(42)
fl.sources <- list.files("../../scripts/utils/", full.names = T)
tmp <- sapply(fl.sources,source)
Since each context (TRM or TexTerm) has several networks in different replicate samples, we first calculated the mean edge weight of TF-regulatee pair across samples. We kept regulatees with moderate variation across TFs (sd>1). Then we calculated the spearman correlation of TFs based on the edge weight profile.
edge_combined.csv in each network/
foldermean_edge_weight.csv: rows are
regulatees (regulated downstream genes) and columns are TFsTF_corr.csv: rows and columns are TFs,
symmetric matrix with diagonal as 1.We’ll skip this part for demo purpose. The output file is provided.
# sample <- 'TRM' # 'TRM' or 'TexTerm'
# tag <- '_subset_TFs_v2'
#
# L <- readLines(paste0(sample,'_samples.txt'))
# TFs <- readLines(paste0(sample,'_TFs_for_network_v3.txt'))
# TFs <- lapply(TFs, toupper)
#
#
# edge_weight <- do.call("rbind", lapply(L, function(x) read.table(paste0(x,'/edges_combined.csv'), sep = ",", header = T) %>% select(c(1,2,3)) %>% setNames(c("TF","regulatee","weight")))) %>% dplyr::filter(TF %in% TFs) %>%
# group_by(TF, regulatee) %>% dplyr::summarize(weight = mean(weight, na.rm=TRUE)) %>%
# reshape2::dcast(regulatee ~ TF, value.var = "weight", fill=0) %>%
# tibble::column_to_rownames(var = "regulatee")
#
# df <- edge_weight %>% rowwise() %>% mutate(sd=sd(c_across(everything()))) %>% ungroup() %>%
# filter(sd>1) %>% select(!matches("MOUSE$|[0-9\\.]{6}$")) %>% select(-sd) %>%
# rename_with(firstup) %>% as.matrix %>% cor(method = "spearman") %>% as.data.frame
# write.csv(df,file = paste0("TF_corr_",sample,tag,".csv")) # write to file
# load TF-TF correlation file
sample <- 'TexTerm' ### 'TRM' or 'TexTerm'
tag <- '_subset_TFs_v2'
df <- read.csv(paste0("TF_corr_",sample,tag,".csv"), row.names = 1)
knitr::kable(head(df[,1:6]), caption = 'TF-TF spearman correlation') |> kableExtra::kable_styling(latex_options = 'scale_down')
| Ahr | Arid3a | Arnt | Arntl | Atf1 | Atf2 | |
|---|---|---|---|---|---|---|
| Ahr | 1.0000000 | 0.0159095 | 0.1552732 | 0.0674772 | 0.0967936 | 0.0653700 |
| Arid3a | 0.0159095 | 1.0000000 | 0.1085567 | 0.0820837 | 0.1034220 | 0.1501552 |
| Arnt | 0.1552732 | 0.1085567 | 1.0000000 | 0.8563119 | 0.1860339 | 0.1763554 |
| Arntl | 0.0674772 | 0.0820837 | 0.8563119 | 1.0000000 | 0.1636281 | 0.1480576 |
| Atf1 | 0.0967936 | 0.1034220 | 0.1860339 | 0.1636281 | 1.0000000 | 0.5753676 |
| Atf2 | 0.0653700 | 0.1501552 | 0.1763554 | 0.1480576 | 0.5753676 | 1.0000000 |
print(dim(df))
## [1] 170 170
We used R package huge to construct Gaussian graphical
models (GGM) with copula modification. A little bit background to help
with selction of hyper-parameters.
GGM assumes that the observations have a multivariate Gaussian distribution with mean \(µ\), and covariance matrix \(Σ\). The conditional independence can be implied by the inverse covariance (concentration) matrix \(Ω = Σ^{−1}\). If \(Ω_{jk} = 0\), then the i-th variable and j-th variables are conditional independent given all other variables.
This important property serves as the foundation for GGM to infer direct interactions from high-dimensional data. Unlike relevance networks or correlation networks, in which edges are determined based on marginal correlations, GGM provides a stronger criterion of dependency, and thus further reduces the false positive rate.
However, a great limitation of classic learning methods for GGM is the lack of sparsity in the resulting graph. A dense graph not only complicates downstream analysis but also raises the issue of overfitting the data. Thus it makes sense to impose an \(ℓ\) penalty for the estimation of \(Ω\), to increase its sparsity, and the sparse pattern of \(Ω\) is essentially the same as the adjacency matrix of the underlying undirected graph.
Additionally, Liu et al. developed a data transformation method called Copula that can be used with the graphical Lasso algorithm to relax the normality assumption of GGM. The intuition of Copula is to infer the the relationship of the variables without being influenced by their individual distributions
For more details about package huge, check the vignette.
For instructions on selecting the hyper-parameters like
npn.func, check the manual
npn.func <- 'shrinkage'
df.npn <- huge.npn(df, npn.func = npn.func) # default
## Conducting the nonparanormal (npn) transformation via shrunkun ECDF....done.
out.npn <- huge(df.npn, method = "glasso", nlambda = 30, lambda.min.ratio = 0.01) # fit glasso model to data
##
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 3%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 6%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 9%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 13%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 16%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 19%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 23%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 26%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 30%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 33%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 36%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 40%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 43%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 46%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 50%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 53%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 56%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 60%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 63%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 66%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 70%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 73%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 76%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 80%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 83%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 86%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 90%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 93%
Conducting the graphical lasso (glasso) wtih lossless screening....in progress: 96%
## Conducting the graphical lasso (glasso)....done.
saveRDS(out.npn,paste0("result_copula_",sample,'_',npn.func,tag,".rds"))
The program automatically sets up a sequence of 30 regularization parameters and estimates the corresponding graph path.
# default visualization provided by huge package
plot(out.npn)
From the above plot, we can see the local minimum of sparsity is 0.15. We can infer the precision matrix when sparsity=0.15
sparsity <- 0.15
idx = which(out.npn$sparsity>=sparsity & out.npn$sparsity<=sparsity+0.01)[1]
print(paste0("lambda: ", out.npn$lambda[idx]))
## [1] "lambda: 0.0572582995483069"
print(paste0("sparsity: ", out.npn$sparsity[idx]))
## [1] "sparsity: 0.157117995127045"
m = out.npn$icov[[idx]]
rownames(m) <- rownames(df)
colnames(m) <- colnames(df)
knitr::kable(head(m[,1:6]), caption = 'sparse TF-TF correlation matrix') |> kableExtra::kable_styling(latex_options = 'scale_down')
| Ahr | Arid3a | Arnt | Arntl | Atf1 | Atf2 | |
|---|---|---|---|---|---|---|
| Ahr | 3.1226310 | 0.0143988 | -0.0456422 | 0.0000000 | -0.1233884 | 0.0000000 |
| Arid3a | 0.0142164 | 4.9156918 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
| Arnt | -0.0457246 | 0.0000000 | 7.0673992 | -0.9341928 | 0.0000000 | 0.0000000 |
| Arntl | 0.0000000 | 0.0000000 | -0.9345756 | 7.2953688 | 0.0000000 | 0.0000000 |
| Atf1 | -0.1234237 | 0.0000000 | 0.0000000 | 0.0000000 | 5.7669647 | -0.0311552 |
| Atf2 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | -0.0309058 | 5.5925174 |
## write to file
write.csv(m, paste0("precision_copula_",sample,"_",npn.func,"_sparsity_",sparsity,tag,".csv"))
df <- data.frame(row=rownames(m)[row(m)[upper.tri(m)]],
col=colnames(m)[col(m)[upper.tri(m)]],
corr=m[upper.tri(m)])
### remove zero entries
df <- df[df$corr!=0,]
knitr::kable(head(df), caption = 'long form of TF correlation') |> kableExtra::kable_styling(latex_options = 'scale_down')
| row | col | corr | |
|---|---|---|---|
| 1 | Ahr | Arid3a | 0.0143988 |
| 2 | Ahr | Arnt | -0.0456422 |
| 6 | Arnt | Arntl | -0.9341928 |
| 7 | Ahr | Atf1 | -0.1233884 |
| 15 | Atf1 | Atf2 | -0.0311552 |
| 20 | Atf1 | Atf7 | -0.5549663 |
print(dim(df))
## [1] 2257 3
print(paste('percent of correlation score >','0',':',sum(df$corr > 0) *100 / nrow(df),"%"))
## [1] "percent of correlation score > 0 : 13.4692069118299 %"
print(paste('percent of correlation score <','0',':',sum(df$corr < 0) *100 / nrow(df),"%"))
## [1] "percent of correlation score < 0 : 86.5307930881701 %"
df$corr = abs(df$corr)
write.csv(df, paste0("correlation_intensity_table_copula_",sample,"_",npn.func,"_sparsity_",sparsity,tag,".csv"))
Node annotation came from the results of Fig. S3a.
TRM_Tex: multi-tasker in both TRM and TexTermTex_specific/TRM_specific: single-tasker
in TexTerm/TRMTex_important/TRM_important: important in
TexTerm/TRMHousekeeping: 54 universal TFs with small variation
across cell typesnodes <- read.xlsx('TRM_Tex_TF_list_20240121.xlsx', sheetName = sample) %>% tidyr::drop_na(TF) %>% dplyr::mutate(group=ifelse(is.na(Specificity), Important, Specificity)) %>% select(TF, group)
names(nodes) <- c('name','group')
knitr::kable(head(nodes), caption = 'node meta') |> kableExtra::kable_styling(latex_options = 'scale_down')
| name | group |
|---|---|
| Bhlhe40 | TRM_Tex |
| Nr4a2 | TRM_Tex |
| Snai3 | TRM_Tex |
| Xbp1 | TRM_Tex |
| Egr1 | TRM_Tex |
| Nfil3 | TRM_Tex |
print(dim(nodes))
## [1] 170 2
g = graph_from_data_frame(df, directed=FALSE, vertices = nodes)
print(g, e=TRUE, v=TRUE)
## IGRAPH 051863b UN-- 170 2257 --
## + attr: name (v/c), group (v/c), corr (e/n)
## + edges from 051863b (vertex names):
## [1] Ahr --Arid3a Ahr --Arnt Arnt --Arntl Ahr --Atf1
## [5] Atf2 --Atf1 Atf1 --Atf7 Atf2 --Atf7 Arid3a --Bbx
## [9] Bhlhe40--Arnt Bhlhe40--Arntl Bhlhe41--Arid3a Bhlhe41--Arnt
## [13] Bhlhe41--Arntl Bhlhe40--Bhlhe41 Ahr --Cbfb Cbfb --Atf1
## [17] Batf --Cbfb Cebpb --Atf2 Ahr --Creb1 Creb1 --Atf1
## [21] Creb1 --Atf2 Creb1 --Atf7 Bhlhe41--Creb1 Crem --Ahr
## [25] Crem --Atf1 Crem --Atf2 Crem --Atf7 Crem --Bhlhe41
## [29] Crem --Creb1 Ctcf --Atf1 Ctcf --Bbx Cebpb --Ctcf
## + ... omitted several edges
Leiden clustering generates communities. Adjust the res
to get a reasonable number of groups and high modularity score. High
res gives more groups. Through experimenting, res=0.9 suits
the best.
set.seed(42)
res <- 0.9
clustering = cluster_leiden(g, objective_function="modularity",resolution_parameter=res)
print(clustering)
## IGRAPH clustering leiden, groups: 5, mod: NA
## + groups:
## $`1`
## [1] "Bhlhe40" "Xbp1" "Max" "Setbp1" "Mxd1" "Tigd2" "Heyl"
## [8] "Bhlhe41" "Tfeb" "Usf1" "Arnt" "Mlx" "Srebf1" "Arntl"
## [15] "Tfe3"
##
## $`2`
## [1] "Nr4a2" "Nfil3" "Gfi1" "Crem" "Tef" "Zbtb3" "Homez"
## [8] "Thrb" "Ddit3" "Jun" "Irf4" "Meox2" "Fos" "Isl2"
## [15] "Prdm1" "Nfe2l1" "Dmrt2" "Eomes" "Zbtb20" "Irf8" "Arid3a"
## [22] "Foxj2" "Batf" "Nfatc1" "Jund" "Cebpb" "Vax2" "Foxd2"
## + ... omitted several groups/vertices
print(sizes(clustering))
## Community sizes
## 1 2 3 4 5
## 15 66 24 45 20
print(paste0('modularity:',modularity(g, membership(clustering))))
## [1] "modularity:0.372509275038383"
saveRDS(clustering,paste0('clustering_copula_',sample,'_',npn.func,'_res',res,'_s',sparsity,tag,'.rds'))
## write clustering result
clust <- data.frame(name = names(membership(clustering)),cluster = as.numeric(membership(clustering))) %>%
dplyr::inner_join(nodes, by = "name") %>%
dplyr::arrange(group, cluster)
knitr::kable(head(clust), caption = 'community clustering') |> kableExtra::kable_styling(latex_options = 'scale_down')
| name | cluster | group |
|---|---|---|
| Creb1 | 2 | Housekeeping |
| Stat6 | 2 | Housekeeping |
| Irf3 | 2 | Housekeeping |
| Stat4 | 2 | Housekeeping |
| Gata3 | 2 | Housekeeping |
| Irf1 | 2 | Housekeeping |
write.csv(clust,paste0('clustering_membership_copula_',sample,'_',npn.func,'_res',res,'_s',sparsity,tag,'.csv'), row.names=F)
We need to set some parameters before plot.
# load genes which we want to highlight in the plot
label_targets <- readLines(paste0(sample,'_gene_annotation.txt'))
common_targets <- readLines('common_genes_annotated.txt')
## generate top degree nodes for each community
clust <- data.frame(name = names(membership(clustering)),
cluster = as.numeric(membership(clustering)),
degree=degree(g)) %>%
dplyr::inner_join(nodes, by = "name") %>%
dplyr::group_by(cluster) %>% slice_max(order_by=degree, n = 5) %>% pull(name) %>%
writeLines(paste0('top_degree_nodes_per_community_',sample,'_',npn.func,'_res',res,"_s",sparsity,"_",sample,tag,'.txt'))
# combine top degree nodes to labeled list as well
tmp <- readLines(paste0('top_degree_nodes_per_community_',sample,'_',npn.func,'_res',res,"_s",sparsity,"_",sample,tag,'.txt'))
label_targets <- union(label_targets, tmp)
## add cluster info
V(g)$cluster <- clustering$membership # add community info to graph
V(g)$group <- ifelse(V(g)$name %in% label_targets,sample,"other")
V(g)$group2 <- ifelse(V(g)$name %in% common_targets,"common","specific")
# generate colors based on cluster
colors <- RColorBrewer::brewer.pal(length(clustering),"Set2")
V(g)$color <- colors[V(g)$cluster]
V(g)$color <- ifelse(V(g)$group2=='common',V(g)$color,paste0(V(g)$color,'7f')) # if common TFs, use darker shade
V(g)$frame.color <- "NA"
V(g)$label <- ifelse(V(g)$group == "other",NA,V(g)$name)
V(g)$label.color <- 'black'
V(g)$size <- 0.3*degree(g)
V(g)$label.cex <- 0.75
# set distance of label
V(g)$label.dist <- 0.4
# set edge color
E(g)$color <- "gray80"
# E(g)$size <- 0.1
E(g)$width <- 1.5*E(g)$corr
cut_off <- 0.9
E(g)$lty <- ifelse(E(g)$corr>=unname(quantile(E(g)$corr, cut_off)), 1, 0) # only show top edges
# set the within-module edges to some large weight, and the between module edges to some small weight
# and then choose 'layout_with_fr' to make the grouped layout
edge.weights <- function(community, network, weight.within = 10, weight.between = 1) {
bridges <- crossing(communities = community, graph = network)
weights <- ifelse(test = bridges, yes = weight.between, no = weight.within)
return(weights)
}
E(g)$weight <- edge.weights(clustering, g)
plot(g, layout=layout_with_fr)
## write graph to object
saveRDS(g, paste0("graph_",npn.func,'_res',res,"_s",sparsity,"_",sample,"_",tag,".rds"))
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggpubr_0.6.0 xlsx_0.6.5 huge_1.3.5 RColorBrewer_1.1-3
## [5] dplyr_1.1.2 ggplot2_3.4.2 igraph_1.5.0
##
## loaded via a namespace (and not attached):
## [1] xlsxjars_0.6.1 sass_0.4.6 utf8_1.2.3 generics_0.1.3
## [5] tidyr_1.3.0 xml2_1.3.4 rstatix_0.7.2 stringi_1.7.12
## [9] lattice_0.21-8 digest_0.6.31 magrittr_2.0.3 evaluate_0.21
## [13] fastmap_1.1.1 jsonlite_1.8.7 Matrix_1.6-5 backports_1.4.1
## [17] httr_1.4.6 rvest_1.0.3 purrr_1.0.1 fansi_1.0.4
## [21] viridisLite_0.4.2 scales_1.2.1 jquerylib_0.1.4 abind_1.4-5
## [25] cli_3.6.1 rlang_1.1.1 munsell_0.5.0 withr_2.5.0
## [29] cachem_1.0.8 yaml_2.3.7 tools_4.3.1 ggsignif_0.6.4
## [33] colorspace_2.1-0 webshot_0.5.5 kableExtra_1.3.4 broom_1.0.5
## [37] vctrs_0.6.3 R6_2.5.1 lifecycle_1.0.3 stringr_1.5.0
## [41] car_3.1-2 MASS_7.3-60 pkgconfig_2.0.3 rJava_1.0-6
## [45] pillar_1.9.0 bslib_0.5.0 gtable_0.3.3 glue_1.6.2
## [49] Rcpp_1.0.10 systemfonts_1.0.4 highr_0.10 xfun_0.39
## [53] tibble_3.2.1 tidyselect_1.2.0 rstudioapi_0.14 knitr_1.43
## [57] htmltools_0.5.5 svglite_2.1.1 carData_3.0-5 rmarkdown_2.23
## [61] compiler_4.3.1