You can install the development version of vssHiC from Github.
is_installed = "vssHiC" %in% rownames(installed.packages())
if (! is_installed){
library(devtools)
devtools::install_github("nedashokraneh/vssHiC")
}
library(vssHiC)
library(coolR)
library(ggplot2)
library(ggpubr)
library(HiContacts)
Genome-wide chromosome conformation capture assay Hi-C provides
insights into chromatin 3D structures and their functional implications.
Read counts from the Hi-C assay do not have constant variance, meaning
we expect an observed interaction between a pair of genomic bins with an
observed interaction of 1000 reads to have different (usually higher)
uncertainty than another pair of genomic bins with 100 reads. This
property of heteroskedasticity impedes visualization and downstream
analyses. vssHiC is a statistical method to transform read
counts so transformed signals have a unit variance across a dynamic
range given a Hi-C experiment with two biological replicates available.
We show that the heatmap visualization of pairwise contacts within a
genomic region is more visually appealing given vssHiC
signals compared to the raw and log-transformed signals. Furthermore,
vssHiC signals improve the performance of subcompartment
caller relying on Gaussian observed variables. vssHiC works
with a well-developed R class (InteractionSet) and file
format (.cool) for Hi-C data.
vssHiC takes two processed Hi-C data in
.cool format as input and vssHiC::cool2IntSet
loads interactions corresponding to one pair of chromosomes into
InteractionSet object. vssHiC::cool2IntSet is
adapted from coolR::cool2IntSet with two small
modifications to load InteractionSet object corresponding
to only one pair of chromosomes and handle inter-chromosomal
interactions.
files = list("../sample_data/GM12878_rep1.cool",
"../sample_data/GM12878_rep2.cool")
chr = "21"
res = NULL
iset = vssHiC::cool2IntSet(files, chr, chr, res)
iset
## class: InteractionSet
## dim: 60667 2
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames(2): IF1 IF2
## colData names(1): totals
## type: GInteractions
## regions: 357
Estimated mean and standard deviations (stdev) across a dynamic range based on VSS algorithm show the dependency between mean and variance such that higher signal intensities have higher variance. This dependency is robust to auxiliary type choice (whether replicate 1 or replicate 2 is used as an auxiliary signal) as long as total read counts of two replicates are not very different.
# total number of read read counts per replicate
print(colSums(assay(iset)))
## IF1 IF2
## 14944983 12594605
train.result1 = vssHiC::vssHiC.train(iset, "rep1")
train.result2 = vssHiC::vssHiC.train(iset, "rep2")
ggplot() +
geom_line(data = train.result1$vss_mean_sd[train.result1$vss_mean_sd$mean < 500, ],
aes(x = mean, y = pred_sd, colour = "rep1")) +
geom_line(data = train.result2$vss_mean_sd[train.result2$vss_mean_sd$mean < 500, ],
aes(x = mean, y = pred_sd, colour = "rep2")) +
xlab("Mean") + ylab("Fitted stdev") +
theme(axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12)) +
scale_color_manual(name = "Aux type",
values = c("rep1" = "darkblue",
"rep2" = "red"))
vss.iset = vssHiC::vssHiC.transform(iset, train.result1$vss_fit)
vss.iset
## class: InteractionSet
## dim: 60667 2
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames(2): rep1 rep2
## colData names(0):
## type: GInteractions
## regions: 357
After variance stabilization, vssHiC signals have an approximate unit variance (stdev = 1) across a dynamic range.
vss.result = vssHiC::vssHiC.train(vss.iset, "rep1")
ggplot(data = vss.result$vss_mean_sd) +
geom_point(aes(x = mean, y = sd)) +
geom_line(aes(x = mean, y = pred_sd), colour = "green") +
xlab("Mean") + ylab("Stdev") +
theme(axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12))
One application of signals with a unit variance is their
visualization. For example, the heatmap visualization of chromatin
contact maps should reflect differences across a dynamic range of
signals including small intensity of long-range interactions and large
intensity of short-range interactions. A common heuristic variance
stabilizing transformation, log function, inflates the variance of
low-intensity signals while stabilizing the variance of larger signals.
Therefore, the visualization of log-scaled contact maps results in a
blurry expected plaid pattern of long-range interactions. The
plotMatrix function from the HiContacts
package designs a manual color map (coolerColors) carefully
picked for Hi-C contact heatmaps for visualization of log-scaled
signals. We show that signals with a unit variance alleviate the need
for manual color maps and avoid blurry plaid patterns even with default
color maps like bgrColors.
vss_GI = iset2GI(vss.iset, 1)
vss_GI_sub = subset_GI(vss_GI, 100000, 30000000, 42000000)
vss_mat_plot1 = plotMatrix(vss_GI_sub, scale = "linear", cmap = bgrColors())
vss_mat_plot2 = plotMatrix(vss_GI_sub, scale = "linear", cmap = coolerColors())
ggarrange(vss_mat_plot1,
vss_mat_plot2,
ncol=2)
raw_GI = iset2GI(iset, 1)
raw_GI_sub = subset_GI(raw_GI, 100000, 30000000, 42000000)
log_mat_plot1 = plotMatrix(raw_GI_sub, scale = "log10", cmap = bgrColors())
log_mat_plot2 = plotMatrix(raw_GI_sub, scale = "log10", cmap = coolerColors())
ggarrange(log_mat_plot1,
log_mat_plot2,
ncol=2)
vssHiC has wrapper functions for three TAD callers, TopDom, SpectralTAD, and
HiCseg to take
InteractionSet as input, call TADs, and return
GRanges object including identified TAD ranges.
GRanges objects can be easily visualized on top of the
plotMatrix output.
rep_num = 1
hicseg_res = iset_hicseg(iset,
rep_num,
max_cp = 40,
model = "D",
dist = "G")
window_size = 2000000
topdom_res = iset_topdom(iset,
rep_num,
100000,
window_size)
spectral_res = iset_spectralTAD(iset,
rep_num,
100000,
window_size)
spectral_plot = plotMatrixBorder(vss_GI_sub, spectral_res,
scale = "linear", title = "SpectralTAD")
topdom_plot = plotMatrixBorder(vss_GI_sub, topdom_res,
scale = "linear", title = "TopDom")
hicseg_plot = plotMatrixBorder(vss_GI_sub, hicseg_res,
scale = "linear", title = "HiCseg")
ggarrange(spectral_plot,
topdom_plot,
hicseg_plot,
ncol=2)
## $`1`
##
## $`2`
##
## attr(,"class")
## [1] "list" "ggarrange"
Neda Shokraneh Kenari, Faezeh Bayat, Maxwell Libbrecht. VSS-Hi-C: Variance-stabilized signals for chromatin contacts. bioRxiv 2021.10.19.
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
## [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
## [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Vancouver
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] HiContacts_1.5.0 HiCExperiment_1.2.0
## [3] ggpubr_0.6.0 ggplot2_3.5.0
## [5] coolR_0.0.0.9000 rhdf5_2.46.1
## [7] InteractionSet_1.30.0 SummarizedExperiment_1.32.0
## [9] Biobase_2.62.0 MatrixGenerics_1.14.0
## [11] matrixStats_1.2.0 GenomicRanges_1.54.1
## [13] GenomeInfoDb_1.38.8 IRanges_2.36.0
## [15] S4Vectors_0.40.2 BiocGenerics_0.48.1
## [17] vssHiC_0.0.0.9000
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 rlang_1.1.3 magrittr_2.0.3
## [4] compiler_4.3.1 vctrs_0.6.5 stringr_1.5.1
## [7] pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1
## [10] backports_1.4.1 XVector_0.42.0 labeling_0.4.3
## [13] utf8_1.2.4 rmarkdown_2.26 tzdb_0.4.0
## [16] ggbeeswarm_0.7.2 strawr_0.0.91 purrr_1.0.2
## [19] bit_4.0.5 xfun_0.43 haarfisz_4.5.4
## [22] zlibbioc_1.48.2 cachem_1.0.8 wavethresh_4.7.2
## [25] jsonlite_1.8.8 highr_0.10 rhdf5filters_1.14.1
## [28] DelayedArray_0.28.0 Rhdf5lib_1.24.2 BiocParallel_1.36.0
## [31] jpeg_0.1-10 cluster_2.1.6 broom_1.0.5
## [34] parallel_4.3.1 R6_2.5.1 bslib_0.6.1
## [37] stringi_1.8.3 car_3.1-2 jquerylib_0.1.4
## [40] Rcpp_1.0.12 knitr_1.45 readr_2.1.5
## [43] Matrix_1.6-5 tidyselect_1.2.1 rstudioapi_0.16.0
## [46] abind_1.4-5 yaml_2.3.8 codetools_0.2-20
## [49] lattice_0.22-6 tibble_3.2.1 withr_3.0.0
## [52] evaluate_0.23 ggrastr_1.0.2 PRIMME_3.2-6
## [55] pillar_1.9.0 carData_3.0-5 generics_0.1.3
## [58] vroom_1.6.5 RCurl_1.98-1.12 hms_1.1.3
## [61] munsell_0.5.1 scales_1.3.0 glue_1.7.0
## [64] tools_4.3.1 BiocIO_1.12.0 data.table_1.15.4
## [67] RSpectra_0.16-1 locfit_1.5-9.9 ggsignif_0.6.4
## [70] Cairo_1.6-2 cowplot_1.1.3 grid_4.3.1
## [73] tidyr_1.3.1 colorspace_2.1-0 GenomeInfoDbData_1.2.11
## [76] beeswarm_0.4.0 vipor_0.4.7 cli_3.6.2
## [79] fansi_1.0.6 S4Arrays_1.2.1 dplyr_1.1.4
## [82] gtable_0.3.4 rstatix_0.7.2 DESeq2_1.42.1
## [85] sass_0.4.4 digest_0.6.35 SparseArray_1.2.4
## [88] farver_2.1.1 htmltools_0.5.8.1 lifecycle_1.0.4
## [91] HiCseg_1.1 bit64_4.0.5 MASS_7.3-60.0.1