INDEX

Introduction
The role of chromatin in gene regulation
Chromatin Assays
3D Chromatin assay
Accessibility Assays
DNase-seq
ATAC-seq
FAIRE-seq
NICE-seq
MNAse-seq
Chip- seq
Bioinformatics analysis
Motif discovery
Finding and predicting regulatory elements
Cis-regulatory elements from accessibility assays
Motifs
Footprints
Nucleosome positioning
Histone modifications
Chromatin states
Databases
The implementation of chromatin datasets
Conclusions and perspectives
References

Word count: 5432

Introduction

The modelling of the data obtained from chromatin assays can provide us with a visual landscape of the genetic and epigenetic interactions occurring in the cells/tissues at a specific time point1. Several computational programs have been developed to process the datasets, fill the gaps, find constants, and predict the possible role of epigenetic elements in gene regulation. This review explores how chromatin datasets can predict transcription factor activity in different contexts and how this information has been leveraged to provide insights into normal, developmental and disorder conditions. The DNA is packaged in the cell nucleus, 146 base pairs (bp) coiled around a complex of 4 dimers of proteins (histones) H2A, H2B, H3 and H4. The formed structure is referred to as nucleosome2 which, can form different arrangements. The basic level, 10nm fibre or beads on a string, is loose and consists of the connexion of each nucleosome, leaving a space of 60bp (linker DNA). Whilst further compaction is tight and involves the participation of histone H1 (liker histone), heterochromatin protein 1 (HP1), Mg2+, and positively charged molecules 3. The compaction may generate a 30nm fibre arrangement (though discredited by some authors 4,5) or nucleosome clutches, probably depending on the external ion concentration 6. However, the chromatin arrangements lead to their condensation and formation of the chromosomes during cell division (Figure 1).

“Figure 1. Chromatin structure and arrangement. Image modified obtained from 7” ¹

Nucleosome arrangements affect the chromatin structure categorised in two principal interchangeable states; opened with nucleosome depleted regions (NDR) or nucleosome-free regions (NFR), relaxed for transcription; euchromatin, and compacted, closed, repressed or transcriptionally inactive; heterochromatin. These states have an essential role in the DNA accessibility for transcriptional regulators that modulate DNA replication, recombination and repair 8. The changes in chromatin structure can be regulated by histone post-translational modifications (PTM). The histone N-terminals are often rich in positively charged aminoacids such as lysines and glycines, which are often prone to biochemical modifications such as acetylation, methylation, phosphorylation, sumoylation, or ubiquitination 9. Those PTM affect the chromatin compaction, i.e. it has been proposed that the acetyl groups neutralise the positive charges of aminoacids, hence losing the DNA-histone interaction leading to DNA be more accessible for transcription. At the same time, deacylation tends to compacts the chromatin making it more inaccessible 10. Moreover, RNA-Chromatin interactions have also shown those similar outcomes 11. Besides the change of charge that the addition of histone PTM may generate, their main role is their function as docking sites for proteins that recognise them (readers) and recruit ATP-dependent enzymes (chromatin remodellers) 12,13 that alter the nucleosome arrangements changing their organisation, ejecting them, sliding them or interacting with other proteins to replace the histones for variants (H2A.X/Z/B/1H Macro-H2A, H3.3)14,15. Those alterations allow the binding of transcription factors (TFs) to the DNA 16, regulate the enhancer landscape 17 or provide genome stability in incidental damage 8. The wide range of enzymes that lie down (writers), read (readers) and remove (erasers) the functional groups are group and site-specific. Group-specific refers to the enzymes interacting with specific biochemical marks. For instance, acetylations are laid down by histone acetyltransferases (HATS), removed by histone deacetylases (HDACS) and read by specific proteins called ‘bromodomains’ 18. Whereas histone methylation is laid down by methyltransferases, removed by histone demethylases and read by chromodomains. Besides, methylations laid down by a specific complex (polycomb repression complex 2, PRC2) are associated with heterochromatin states19. Site-specific refers to the aminoacid in which a specific PTM is attached and its potential functional role itself or in combination with others. Histone acetylation has been widely correlated with active transcriptional outcomes, some highly correlated with active gene promoters (H3K9ac), enhancers (H3K27ac), early replication states (H3K9ac, H3K27ac). On the contrary, histone methylation has been correlated to active and repressive transcription states. Some active transcription marks are H3K4me3, H3K36me3, H3K9me3 and H4K20me. Marks associated with enhancer regions are H3K4me1,2, H2K36me, H3K27me and H2K5me. Some associated with early replication are H3K4me1,2,3 and H4K20me, while some found in the last replication are HEK9me2 and H3K9me319. Understanding histone patterns position, dynamics, and correlation with regulation have allowed prediction and categorisation of functional regions in developmental stages, cell differentiation 20 or diseases. Furthermore, human PTM and their associated functions have been compiled by different authors 21-23 and integrated into databases 24.

The role of chromatin in gene regulation

The information that regulates transcription comprises two elements; cis and trans. Cis-regulatory elements (CREs) are on DNA in sequences containing transcription factors binding sites (TFBS). Promoters and proximal CREs are localised nearby TSS (the point where the transcription machinery ensembles)25. In contrast, distal CREs such as enhancers, super-enhancers insulators, silencers and locus control regions (LCR) are separated ten to thousands of kilobases upstream or downstream away from TSS, sometimes within the target gene26. CREs are recognised by trans elements, referred to as general transcription factors (GTF), which involve a broad range of promoter-activator proteins (activators), coactivators, and mediators. In order to allow poll II transcription from protein-coding genes, it is necessary to form a transcription complex at the TSS. Since nucleosome occupancy or other DNA binding proteins may occlude the CREs, TSS and promoters of active genes are often characterised by NFR. Although silent genes have also shown open and accessible CREs, they have also been associated with being regulated through repressive histone marks such as H3K27me3 27 and key silencer TFs 28. Thereby it has been suggested that nucleosome occupancy is not strictly correlated with repressive or active states 29 but, instead, the states are influenced by histone variants (H2A.Z and H3.3), histone PTM correlated with nucleosome turnover (H3K4me), or short unstable RNA transcripts associated with enhancers activity (eRNAs)30. In 2019 26 proposed four models to explain how accessibility is established by the action of DNA- TF-remodellers. The first one is by parsimonious competition between the TF and nucleosome, in which the higher concentration of TF can bind DNA and recruit stabiliser cofactors (possible only in euchromatin). The second is by cis proximal H1 and architectural proteins displacement, where TFs bind to linker DNA and destabilise them, dispersing the action to proximal nucleosomes. The third one is ‘TF-trans control’, where TFs bind to distal CREs and recruit other cofactors that evict nucleosomes in trans, forming a loop for enhancer-promoters interactions 31. Finally, by the pioneer TF binding to nucleosomal DNA to promote histone core displacement itself or recruiting chromatin remodellers to establish an open chromatin state. However, it has also been proposed that torsional stress may trigger the nucleosome eviction to process transcription 32. Chromatin is not static but dynamically condensed and decondensed to enable transcriptional activity. The transcriptional machinery involves forming an initiation complex and the availability and interaction of CREs, TFs, coactivators and participation histone PTM and chromatin remodeler complexes. Thereby gene transcription regulation is in certain part mediated by the chromatin 31.

Chromatin Assays

Chromatin has been studied by analytic experiments that reflect its organisation33, biochemical marks, associated proteins and arrangements34. The organisation have been studied employing chromosome conformation capture-based techniques. i.e., the Hi-C method shows all the possible interactions between all chromatin fragments simultaneously, producing genome-wide interaction maps35. The study of associated proteins and histone marks have been made employing CHIP-seq experiments36 (although Mass Spectrometry and SELEX methods can also investigate them). The accessible chromatin can be studied by ATAC-seq, DNase-seq, Nice-seq, and FAIRE-seq experiments. Finally, the nucleosome positioning can be visualised using ATAC-seq and MNAse-seq1,37. The integration of the information provided by the previously mentioned assays leads to the identification of CREs and the posterior creation of chromatin state maps38 that, according to the histone marks, proteins associated or accessible regions reveal the gene activity and variability among cell type, tissues, developmental stage or disease conditions. The previously mentioned assays and their improvements are reviewed in the following paragraphs. 3D Chromatin assay One widely used technique to research the spatial organisation of chromatin is Hi-C. The traditional technique involves using formaldehyde to cross-link the DNA interactions, followed by restriction digestion and incorporation of biotin linked nucleotides to repair the cut endings. Posteriorly, they are proximity ligated, and the cross-linking is reversed as well as the biotin. The resulting fragments are paired-end sequenced. However, modifications and improvements that remove steps such as the biotin labelling have been proposed 35 using specialised computational software for the data processing 33. In the same way, protocols that involve the single-cell analysis are available 39 with improvements for the computational analysis that contemplate their possible bias40. Nonetheless, in general terms, the data processing of all Hi-C assays have to undergo a quality control process where the low and wrong signals are filtrated, aligned to a reference genome, compartmentalised in bins and normalised. The programs utilised to analyse and visualise the Hi-C data often already integrate the before mentioned steps. Although the alignment can be performed using common alignment software such as Bowtie, Hi-C disposes of their specific packages that give more priority either to the visualisation resolution (Juicebox) 41, multiscale contact map (HiGlass)42, mitigation of the sequencing depth to detect loops (Coolpup.py) 43 or are adapted to be work with sc-Hi-C data (Galaxy-HiC Explorer)40. The Hi-C analysis reflects the interactions and how they may affect the genome functionality either in rectangular or triangular heatmaps, arc or circular plots with static or interactive zooming, depending on the program utilised. The Hi-C research allowed categorising the chromatin in compartments (A and B) related to active and inactive transcription states, the chromosome territories and the interactions of the genes related to their spatial position44.

Accessibility Assays

DNase-seq

DNA hypersensitive regions to the enzyme DNAse I ( EC 3.1.21.1.) has been used to identify regulatory elements such as promoters, insulators, enhancers, locus control regions and silencers45 within open chromatin regions. The method is based on the digestion of chromatin by DNase I and the posterior attachment of biotinylated linker to the cleavaged terminals. The DNA fragments can be identified by next-generation sequencing (NGS). Some protocols employ single-cell populations (scDNase-seq)45,46. However, this method biases are mostly related to the requirement of millions of cells to perform and the data processing limitations, although many bioinformatics software have addressed them.

ATAC-seq

Transposase-Accessible chromatin followed by sequencing (ATAC-seq) is an Assay developed 47 to detect open/accessible chromatin regions using Tn5 transposase, which cut and attaches adapters into accessible regions of chromatin. Although Initially created to process bulk populations of cells. It has been rapidly and continually improved 48,49 for the identification of a single group of cells (scATAC-seq) 50,51, for their use in mitochondrial genetic material (mtsATAC-seq)52, for it to be suitable for diverse cell/tissue samples in freezing conditions (Omni-ATAC) 53, or to make it more efficient utilising intact nuclei material during droplet barcoding (dscATAC-seq) 54. In addition, simulation programs have been launched (simATAC-seq) to generate in silico scATAC-seq samples that estimate the parameters of reading distributions and generates a count array that depicts the regulatory landscape of cells that share similar characteristics55.

FAIRE-seq

Formaldehyde Assisted Isolation of Regulatory Elements (FAIRE) protocol was introduced in 2009 by 56 to isolate genomic regions depleted of nucleosomes. It is based on the cross-linking of proteins to DNA with formaldehyde, eliminating the chromatin by sonication and performing a DNA sequences extraction followed by NGS (for identifying individual loci, qPCR is suggested). The limitation of this method is related to the large number of cells required to perform. In 2018, Segorbe et al. 57 released a protocol that functions utilising 100‐fold fewer cells for the analysis in yeast cells. However, it has not been tested on human tissues. Due to the low cost that this assay represents, Seuter et al. 58 published laboratory issues that this method may present and how to address them with potential solutions. Furthermore, the author made recommendations on what cases it is more appropriate to use FAIRE-seq instead of ATAC-seq, which depends on the experiment since FAIRE-seq does not allow footprinting nor nucleosome positioning analysis.

NICE-seq

Nicking enzyme assisted sequencing (Nice-seq) is another assay proposed for high-resolution open chromatin profiling on living and formaldehyde-fixed cells. In fixed cells, it is needed cross-linking with formaldehyde and biotin labelling while native cells do not. Then it is required the incubation of the samples with the nicking enzyme Nt.CviPII, which nicks the DNA with sequence specificity (CC- A/G/T). After that step, the genomic DNA with the putative open chromatin regions is purified, fragmented, and captured for library construction. The method was compared with ATAC-seq and DNase-seq data showing similar peaks and overlappings over 70%. The advantage of the proposed method is the robustness to be used in different cell preparations 59. Besides, it has been improved for a wide variety of mammalian cells and tissues (UniNicE-seq)60.

MNAse-seq

Micrococcal nuclease followed by NGS (MNase-seq) is an assay that has been used to identify nucleosome positioning 61. It utilises the endo-exo-nuclease MNase (EC 3.1.31.1 )to digest naked DNA found between nucleosomes. After digestion, the nucleosomes are released from chromatin, and the fragments are subjected to sequencing (MNase-seq). The sequences obtained provide nucleosomes occupancy profiles37 and can also be related to chromatin accessibility areas29. However, this method has been criticised since the results depend on the level of digestion and might not reveal an accurate nucleosome occupancy- DNA accessibility relation; to correct this issue and improved quality control (QC), MACC protocol 29 and CAM pipeline have been proposed62. Furthermore, current protocols have been published to decipher the nucleosome landscape in a more accurate manner 37, though its use on human cells is still untested.

Chip- seq

Chromatin immunoprecipitation coupled with sequencing (ChIP-seq) is a technique utilised to identify profiles of histone modifications, transcription factors and nucleosome positioning 63. The method involves the cross-linking with formaldehyde of protein-DNA interactions; then, the complexes are sheared either by sonication or nuclease treatment to obtain small DNA fragments. Antibodies against the DNA binding protein or histone modification are added. After that, the DNA is released from the proteins and analysed using NGS 64. Several protocols modifying the traditional method have been published, focused on; eliminate cross-linking or sonication steps (CUT&RUN)65 and adapt the use of TN5 enzymes; CUT&Tag 66, utilise lower cell number, from 1000 cells; ChIL–seq 67) or few than 100 cells; itCHIP-seq 34, MOWCHiP 68, nanoCHIP-seq 69, ULI-NChIP. Also, to study the marks or TF at individual cell level sc-CHIP-seq 70,coBATCH71, and diminish the potential false negatives or positives (Exo-CHIP-seq) 72 improving the quality control 73. It is important to note that to detect histone marks, specific adaptations to the protocol have emerged utilising microfluidic chambers such as LIFE-Chip 74 and MOWChip75 SurfaceChIP-seq76. The election on which method utilise depends on the regulatory mark desired to study, the availability of reagents, the amount of cell disposed to analyse and the consideration of bias that each assay may show. Nevertheless, DNase-seq and ATAC-seq for chromatin accessibility profiling and CHIP-seq have been the most often utilised.

Bioinformatics analysis

NGS equipment release several gigabases of data per single experimental run that are processed according to the desired analysis to perform. However, all NGS data is firstly processed communally. The obtained data may content artefacts; read mistakes, base-calling errors, insertions, deletions, poor quality reads, contamination with primer-adaptors. These errors impact the processing analysis, so the reads are subjected to a quality process. Once the quality process is carried out, the reads need to be aligned to the reference genome. Alignment is the process in which two or more sequences (the reference sequence and the NGS obtained sequences in this case) are compared and edited using gaps and substitutions to obtain a close similarity of each other77 that is measured by assigning positive scores to matches and penalties (negative scores) to mismatches 78. The University of California Santa Cruz (UCSC) Genome Browser and Genome Reference Consortium (GRC) are sources for the human reference genome. After alignment, the sequence data is collected to perform a second QC analysis, where improperly paired, blacklisted regions79,80, duplicated reads from PCR artefacts are removed. The sequences are aligned and filtered a ‘peak calling’ analysis is performed. According to the assay performed, peak analysis normalises and counts the fragments in the sequences and compares them with others to find enrichments related either with promoters, enhancers, or TF.

Motif discovery

The goal of experiments that analyse protein-DNA interactions obtained from CHIP-seq, SELEX is to discover the affine sequence by the TF. The peak calling process is in charge of that duty. The active binding sites are seen as peaks in enrichment profiles. The intensity of a specific peak depicts the frequency of interaction of the TF and the nucleotide that it binds (interaction strength). Some software employ details about the frequency of nucleotides at a specific site, position weight matrices (PWM) to posteriorly allow their visualisation as a sequence logo (Figure 2). Although a widely used has been MEMEChip81and HOMER82, current software that allow motif discovery de novo are; MotifHyades, based on expectation maximisation 83 BaMM, based on Bayesian Markov model84 SamSelect 85 and STREME, which have demonstrated high accurate performance compared to the previous state-of-the-art computational software used such as MEMEChip and HOMER 86.

Data from experiments that assess the binding affinity of TFs to the DNA sequence (SELEX, CHIP-seq, MNAse-seq) are stored in databases such as JASPAR87, CIS-BP, TRANSFAC and HOMOCO88. Moreover, Gene Ontology, Reactome Pathway Database, Broads MSigDB contain sets of genes associated with their function.

“Figure 2. Motif discovery process from CHIP-seq, SELEX. Own elaboration, created in Biorender.” ²

Finding and predicting regulatory elements

Cis-regulatory elements from accessibility assays

Motifs

Accessible chromatin assays can provide information about transcription factor binding sites (TFBS) present in the chromatin open regions (COR). Nowadays, exist many methods to identify and predict them. One method is based on the screen of the COR to identify putative TFBS utilising previous input of known transcription factor binding motifs in PWM form89. Those reference motifs are obtained from JASPAR, HOMOCO, or TRANSFAC databases and provide the binding affinity of the TF to the DNA sequence (motif) to indirect predict the putative TFBS from accessible DNA experiments 84,90. Nevertheless, some limitations of this method are that the length of the PWM is limited and might be insufficient because of incomplete or half sites identified. Besides, shorter sequences are prone to false positives, and the degenerative regions are usually not considered91.

Footprints

A second method is based on the search of footprints between the COR and matching the sites to putative TFs. The footprint is a sequence pattern that TF leaves when it binds to DNA, preventing the Tn5 or DNAse I cleavage. The sites can be discovered by visualising the footprint mark across peaks and matching known motifs or identifying novel ones 92(Figure 3). However, some authors consider that the inconvenience of using this method is the ambiguity that may create at the assignment of TF to individual footprints93. “Figure 3. Motif identification from Chromatin accessibility datasets. Own elaboration, created in Biorender.” ³

Currently, the majority of methods involve the integration of more data to discover TFBS, either utilising motif match score, sequence conservation length, or CpG islands (Mocap)91. Others add genome annotations of gene expression such as FactorNet 94, DeFcom, 95, CENTIPEDE 96; associate eRNAs30, consider changes in DNA accessibility (BagFoot)97 and infer their TFs activity (DAStk)98. Those considering gene expression have even awarded by the ENCODE-DREAM challenge (Anchor)99 100. Although the previous computational programs can work for DNase-seq and ATAC-seq, due to the bias that each protocol may have, some have been developed to only process ATAC-seq readings (HINT-ATAC) 101 It is important to note that a global analysis of footprint in the human genome has recently depicted around 4.5 million sequences 93. Although it utilised incomplete sampling of tissues, it might serve as a good reference of the potential presence of TFBS in distinct cell types. However, a complete footprint analysis that contemplates the most accurate sites and several tissues and cells are still lacking. The integration of the prediction of multiple regulatory elements has allowed the build of computational programs that consider potential motif mutations (MAGGIE) 102 or that model and infer gene regulatory interactions from high-throughput data, such as ISMARA 103, LISA 104 or SMITE105. However, SMITE requires integrating more data such as interaction network, gene annotations and statistical test from epigenomic profiles to model network modules. The TFBS identification allows the posterior prediction of enhancers and super-enhancers. Their experimental identification has been made by knocking out the enhancer sequence and observing the gene expression reduction resulted106 or applying protocols such as STARR-seq to shred and insert in plasmids to detect enhancer activity 107. The attributed categorisation is used to make predictions posteriorly 108. ENCODE phase 3 have identified 926,535 potential CREs109. On the other hand, the CREs predictions utilise computational methods focused on identifying motif sequences conserved through evolution110 or utilising the known CREs obtained experimentally as reference. However, since enhancer activity is associated with histone modifications and chromatin accessibility, some software make predictions of enhancers based on combinatorial features of (DHS sites), histone modification code 111 or known TF. For example, super-enhancers (those that regulate cell identity genes and confer high expression) have been predicted by identifying enrichments of H3K27ac or binding of specific TF112.

Nucleosome positioning

The ATAC-seq and MNAse-seq experiments allow obtaining fragments from nucleosome positioning regions. In ATAC-seq, the fragments shorter than 100bp are expected to cluster upstream the TSS, whereas that the fragments of ~200, ~400, or ~600bp corresponding to mono, bi and tri nucleosomes are expected to be depleted from the TSS, displaying periodical peaks upstream or downstream of TSS 113. Nucleosomes can provide information about the occupancy within regulatory regions among a single population of cells because they can exhibit heterogeneity in expression114. Thereby, the regulatory sequences may be hidden by nucleosomes at certain moments while at other uncovered and affecting transcription. Although some authors suggest that software developed for MNase-seq such as DANPOS2,115 PuFFIN116 can be used for ATAC-seq90, some are specific for it, such as NucleoATAC and HMMRATAC117 118. The data is visualised in V-plot graphs where the fragments are ordered according to their length around the TFBS (Figure 4).

“Figure 4. representation of V- plot graph. It illustrates an arrangement in which two pairs of nucleosomes delimit a chromatin open region truncated by a TF and the order of the fragments in that situation. B represent the characteristic V-plot pattern chart where 0 corresponds to the TFBS. Obtained from 1” ⁴

Histone modifications

Once the CHIP-seq datasets for histone marks are processed, the histone PTM are often depicted in the form of pile peaks across the genome. The reads count in promoter regions, which are already pre-established as the +- 2k before or after the TSS identify marked promoters. Bivalent promoters overlap marks such as H3K4me and H3K27me3 for at least 400 bp 76. Enhancers and super-enhancers are predicted using computer programs such as ROSE, which stitches enhancers within 15kb and exclude that tithing 2 kb from annotated TSS. Nonetheless, there are specific data processing of histone modifications that learn the mapping patterns to provide high-quality histone CHIP-seq data such as CODA 119

Chromatin states

The identification of PTM (especially acetylation and methylation) and their correlation with specific CREs such as enhancers and promoters, and trans factors as TBP-associated factor 1 (TAF1), RNA polymerase II (RNAPII) and p300 enzyme laid the basis for the categorisation of data in the form of chromatin states 120 as well as their potential to predict regulatory elements in the human genome 121. As they were extended, specific features have characterised regulatory elements, i.e., the H3K4me1 mark for primer enhancers, H3K4me and H3K27ac as marks for active enhancers, H3K4me3 to H3K4Me1 as promoter marks, H3K36me3 with RNAPoll as transcribed regions marks, H3K27me3, H3K9me3 representative of repressive chromatin states36 (Figure 5.). Even the utilisation of one simple signature122 has allowed finding differences in gene expression among disease samples and normal ones 123. However, the integration of more regulatory elements could more accurately reveal the biological differences in pathological and non-affected tissues. The join of all combinatorial epigenetic marks has allowed wider systematic annotations 124 that capable of reflecting their probability of binding at promoters, TSS, transcribed promoters, transcribed regions, strong enhancers, enhancers, heterochromatin, insulator regions, and satellite repeats states to provide a landscape of states that change across the cell fate. On account of their utility, posterior improvements on computational methods to depict more accurately chromatin states have been developed125-130 so far 38,51,131

“Figure 5. Representation of chromatin states according to their representative marks. Image obtained from 36 unedited” ⁵

Chromatin states have allowed the visualisation and discrimination of epigenetic features; distinctive between sex 128, the epigenetic landscape of X chromosome inactivation across tissues and genes affected for it 132, the epigenomic features that differentiate tissues, like gastrointestinal against brain, foetal against adult tissues, or tumours against normal samples 128. Chromatin profiling has also allowed identifying the epigenetic differences in disease cells with or without treatment; i.e., Grosselin et al. found the loss of HEK27me2 in chemotherapy resistance cells and non treated cells (‘resistant like cells’), suggesting that the epigenetic features typical of treatment resistance cells are already found in some tumour cells 133, being a possible feature that predicts the prognosis of some cell tumours under treatment before their implementation.

Databases

Public research consortiums have enabled the storage and identification of functional elements discovered throughout the human genome. ENCODE 134, Roadmap Epigenomics135, International Human epigenome consortium (IHEC)136 are databases containing the data of several chromatin experiments. Multiple analysis and predictions of regulatory regions have been conducted from their data. Databases and data browsers (DB) platforms focus on storing epigenetic predicted marks to effectively standardise and process human epigenetic regulatory data and classify it according to cell, tissues, or diseases conditions38. Some DB have been specifically developed to explore trans regulatory elements (TFs); the hTFtarget database contains 3.4 million predicted records of candidate TF-target regulations obtained from only CHIP-seq data 137. KnockTF provides, besides TFs, details about binding at CREs, although from fewer TF samples 138. However, others’ approach has been the cis-regulatory elements (LncRNAS) 139 based on enhancer (ENdb) 140 or super-enhance (SEanalysis)141 data. One of relevance is ATACdb since it contains chromatin accessibility data processed with enhancers, super-enhancers, TF, SNPs, eQTLs, methylation sites, chromatin interactions, TF footprints and their relation with gene expression to annotate and illustrate the CREs potential roles per tissue or cell 142. Although the specific analysis of trans / cis-regulatory elements is relevant, the integration of multiple omics techniques to demonstrate a wide regulatory landscape have also been proposed. The cistrome DB was launched to provide CHIP-seq, DNase-seq and ATAC-seq curated and processed data143 that depicts cis-regulatory information 144. Nonetheless, the QC has not been that robust and miss-annotation, ambiguity or incompleteness of the data is expected 142. Another database that includes several omics experiments is the gene transcription regulation database (GTRD) that contains processed data from RNA-seq, MNAse-seq, CHIP-seq, and DNase-seq experiments with all cell lines and tissues and comprises TFBS, histone marks, nucleosome landscapes and gene ontology through cell and tissues 25. The human epigenome reference, epimap, illustrates a compendium of 10,000 epigenomic maps across 800 samples. The epigenome integration database contains chromatin states, cis-regulatory elements predictions throughout the different cell and tissues developmental stages with already processed quality control. The limitations of this project are related to the tissue samples not studied at a single cell level and the lack of tissues, environmental conditions to compare, as well as developmental stages 38. Nonetheless, it appears to be the most completed DB with integrated regulatory elements and chromatin states landscapes per tissue and condition.

The implementation of chromatin datasets

The relationship between TF, gene expression and chromatin accessibility has enabled the study of gene expression in normal conditions, human developmental stages, diseases and give insights into therapeutics. In normal condition studies, chromatin datasets provide insights into the potential epigenetics mechanism that leads to dynamic changes in gene expression. Su et al. performed a study to decipher the epigenetic landscape shifts before and after stimulation, finding the typical enrichment of the gained open sites and their relation with changes in gene expression 145. Moreover, Tyssowski et al. analysed the DHS, histone PTM and CpG content in mouse neurons revealing the neuronal activity patterns and the separate activation of enhancers by eRNA and H3K27ac marks146. The creation of atlas of chromatin accessibility and foetal gene expression 147 have identified 657 cell subtypes and potential new TF that regulate cell fate specification, illustrating the expression dynamics from embryonic to foetal developmental stages. Moreover, since the study is based on scATAC-seq and scRNA-seq the expression patterns identify the differences in expression in the same organ. Since the atlas illustrates the regulatory landscape of human development in normal conditions, it can predict gene regulation in other stages of development or disease conditions. Regarding disease conditions, epigenetic profiling has been used to compare the transcriptional responses of myeloid cells in 69 different neurodegenerative diseases, providing 336 expression profiles and their categorisation according to the genes that demonstrate similar responses. This profiling has suggested the possible association between human Alzheimer disease and inflammatory signalling (due to the function of genes activated)148. Current studies have shown that profiling chromatin accessibility integrated with other omics assays (RNA-seq, CHIP-seq) have allowed to categorise cancer subtypes depending on their DNA regulatory elements, discover putative non-coding mutations related to clinical prognosis146. Either way, integration of epigenetic marks such as DNA methylation patterns with CHIP-seq, histone PTM and transcriptome single-cell analysis have allowed decipher how the coordination of epigenetic modifications (chromatin states) allow cells to activate alternate gene regulatory pathways that lead to transcriptional heterogeneity in chronic lymphocytic leukaemias (CLL), contributing with knowledge of this specific disease clinical behaviour 149. Lastly, epigenetic profiling also aids to reveal the functional consequences of gene mutations. The integration of RNA-seq, ATAC-seq and CHIP-seq datasets provided the elucidation of the circumstances in which specific genes mutations (SY24CS) show the ability to open distinct chromatin regions and active alternative transcriptomes. In contrast, others (wing2) does not affect the chromatin accessibility but can increase the protein binding at estrogenic receptor loci under estrogenic therapy, providing insights into the mechanism by which FOX1 mutations perturb its function and lead to breast cancer progression and response to therapy 150.

Conclusions and perspectives

The human body contains trillions of different specialised cells that carry out distinct functions despite having the same genetic information. Since the chromatin structure and their alterations allow gene regulation, the epigenome study helps determine the gene expression outcome and cell fate129. The chromatin assays and the computational software have allowed a more integrative way to visualise and understand the regulatory network that directs the cell functions in distinct conditions, either in normal states, disease, developmental stages or under treatment22,38. The experimental assays rely on several laboratory conditions such as collection, storage, or sample processing that can affect the results. They are addressed by several computational software that aim to provide the most accurate interpretation of the regulatory mechanisms and prediction of the dynamics that may occur36,113. Although each one has its limitations, they have the potential to provide us with schemes and frameworks to find the genetic and epigenetic targets for therapeutics. For instance, nowadays, medicines targeting epigenetic marks, such as DNA methylation representative of certain cancers151 have been developed and commercialised. The epigenetic field has been widely studied. Several protocols have been constantly improved to reduce long-lasting steps and work with fewer cell number and single-cell conditions152,153. In the same way, there has been an increase in the number of improved computational programs to increment their accuracy and process the data reducing the experimental bias. However, it is still necessary their leverage for further studies based on the implementation of single-cell experiments comprising all tissues differences, as well as the utilisation of the multiple databases information to create a better comprehension of the epigenetic landscapes in developmental stages, normal adult stages, a broader range of diseases, as well as investigations based on therapeutic approaches of the epigenome and their possible implementation in clinical trials for the posterior development of medicines that target epigenetic features further than DNA methylation.

References

Albanus, R. D. O. et al. Chromatin information content landscapes inform transcription factor and DNA interactions. Nat Commun 12, 1-12 (2021).
Agbleke, A. A. et al. Advances in Chromatin and Chromosome Research: Perspectives from Multiple Fields. Molecular Cell 79, 881-901, doi:https://doi.org/10.1016/j.molcel.2020.07.003 (2020).
Maeshima, K., Ide, S. & Babokhov, M. Dynamic chromatin organization without the 30-nm fiber. Current Opinion in Cell Biology 58, 95-104, doi:https://doi.org/10.1016/j.ceb.2019.02.003 (2019).
Eltsov, M., Sosnovski, S., Olins, A. L. & Olins, D. E. ELCS in ice: cryo-electron microscopy of nuclear envelope-limited chromatin sheets. Chromosoma 123, 303-312 (2014).
Ou, H. D. et al. ChromEMT: Visualizing 3D chromatin structure and compaction in interphase and mitotic cells. Science 357 (2017).
Maeshima, K., Tamura, S., Hansen, J. C. & Itoh, Y. Fluid-like chromatin: Toward understanding the real chromatin organization present in the cell. Current Opinion in Cell Biology 64, 77-89, doi:https://doi.org/10.1016/j.ceb.2020.02.016 (2020).
Schlick, T., Hayes, J. & Grigoryev, S. Toward Convergence of Experimental Studies and Theoretical Modeling of the Chromatin Fiber *. Journal of Biological Chemistry 287, 5183-5191, doi:10.1074/jbc.R111.305763 (2012).
Kollárovič, G., Topping, C. E., Shaw, E. P. & Chambers, A. L. The human HELLS chromatin remodelling protein promotes end resection to facilitate homologous recombination and contributes to DSB repair within heterochromatin. Nucleic Acids Research 48, 1872-1885, doi:10.1093/nar/gkz1146 (2020).
Coetzee, N. et al. Quantitative chromatin proteomics reveals a dynamic histone post-translational modification landscape that defines asexual and sexual Plasmodium falciparum parasites. Scientific Reports 7, 607, doi:10.1038/s41598-017-00687-7 (2017).
Teves, S. S. & Henikoff, S. Transcription-generated torsional stress destabilizes nucleosomes. Nature structural & molecular biology 21, 88-94 (2014).
Dueva, R. et al. Neutralization of the positive charges on histone tails by RNA promotes an open chromatin structure. Cell chemical biology 26, 1436-1449. e1435 (2019).
Dann, G. P. et al. ISWI chromatin remodellers sense nucleosome modifications to determine substrate preference. Nature 548, 607-611, doi:10.1038/nature23671 (2017).
Chatterjee, N. et al. Histone Acetylation near the Nucleosome Dyad Axis Enhances Nucleosome Disassembly by RSC and SWI/SNF. Molecular and Cellular Biology 35, 4083, doi:10.1128/MCB.00441-15 (2015).
Bhattacharya, S. et al. Histone isoform H2A1H promotes attainment of distinct physiological states by altering chromatin dynamics. Epigenetics Chromatin 10, 48-48, doi:10.1186/s13072-017-0155-z (2017).
El Kennani, S. et al. MS_HistoneDB, a manually curated resource for proteomic analysis of human and mouse histones. Epigenetics Chromatin 10, 2, doi:10.1186/s13072-016-0109-x (2017).
Clapier, C. R., Iwasa, J., Cairns, B. R. & Peterson, C. L. Mechanisms of action and regulation of ATP-dependent chromatin-remodelling complexes. Nature Reviews Molecular Cell Biology 18, 407-422, doi:10.1038/nrm.2017.26 (2017).
Alver, B. H. et al. The SWI/SNF chromatin remodelling complex is required for maintenance of lineage specific enhancers. Nat Commun 8, 14648, doi:10.1038/ncomms14648 (2017).
Li, K. et al. Interrogation of enhancer function by enhancer-targeting CRISPR epigenetic editing. Nat Commun 11, 485, doi:10.1038/s41467-020-14362-5 (2020).
Carlberg, C. & Molnár, F. in Human epigenomics 75-88 (Springer, 2018).
Goudarzi, A. et al. Dynamic Competing Histone H4 K5K8 Acetylation and Butyrylation Are Hallmarks of Highly Active Gene Promoters. Molecular Cell 62, 169-180, doi:https://doi.org/10.1016/j.molcel.2016.03.014 (2016).
Zhao, Y. & Garcia, B. A. Comprehensive catalog of currently documented histone modifications. Cold Spring Harbor perspectives in biology 7, a025064 (2015).
Alaskhar Alhamwe, B. et al. Histone modifications and their role in epigenetics of atopy and allergic diseases. Allergy, Asthma & Clinical Immunology 14, 39, doi:10.1186/s13223-018-0259-4 (2018).
Andrés, M., García-Gomis, D., Ponte, I., Suau, P. & Roque, A. Histone H1 Post-Translational Modifications: Update and Future Perspectives. International journal of molecular sciences 21, 5941 (2020).
Xu, H. et al. PLMD: An updated data resource of protein lysine modifications. Journal of Genetics and Genomics 44, 243-250, doi:https://doi.org/10.1016/j.jgg.2017.03.007 (2017).
Kolmykov, S. et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Research 49, D104-D111, doi:10.1093/nar/gkaa1057 (2021).
Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nature Reviews Genetics 20, 207-220 (2019).
Barski, A. et al. High-Resolution Profiling of Histone Methylations in the Human Genome. Cell 129, 823-837, doi:https://doi.org/10.1016/j.cell.2007.05.009 (2007).
Dogan, N. et al. Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility. Epigenetics Chromatin 8, 16, doi:10.1186/s13072-015-0009-5 (2015).
Mieczkowski, J. et al. MNase titration reveals differences between nucleosome occupancy and chromatin accessibility. Nat Commun 7, 1-11 (2016).
Azofeifa, J. G., Allen, M. A., Hendrix, J. R., Rubin, J. D. & Dowell, R. D. Enhancer RNA profiling predicts transcription factor activity. Genome research 28, 334-344 (2018).
Greenwald, W. W. et al. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nat Commun 10, 1-17 (2019).
Kaczmarczyk, A., Meng, H., Ordu, O., Noort, J. v. & Dekker, N. H. Chromatin fibers stabilize nucleosomes under torsional stress. Nat Commun 11, 126, doi:10.1038/s41467-019-13891-y (2020).
Hong, P. et al. The DLO Hi-C Tool for Digestion-Ligation-Only Hi-C Chromosome Conformation Capture Data Analysis. Genes 11, 289 (2020).
Ai, S. et al. Profiling chromatin states using single-cell itChIP-seq. Nature Cell Biology 21, 1164-1172, doi:10.1038/s41556-019-0383-5 (2019).
Lin, D. et al. Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture. Nature Genetics 50, 754-763, doi:10.1038/s41588-018-0111-2 (2018).
Jiang, S. & Mortazavi, A. Integrating ChIP-seq with other functional genomics data. Briefings in Functional Genomics 17, 104-115, doi:10.1093/bfgp/ely002 (2018).
Hoeijmakers, W. A. M. & Bártfai, R. in Chromatin Immunoprecipitation 83-101 (Springer, 2018).
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300-307, doi:10.1038/s41586-020-03145-z (2021).
Ramani, V. et al. Sci-Hi-C: A single-cell Hi-C method for mapping 3D genome organization in large number of single cells. Methods 170, 61-68, doi:https://doi.org/10.1016/j.ymeth.2019.09.012 (2020).
Wolff, J. et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Research 48, W177-W184, doi:10.1093/nar/gkaa220 (2020).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Systems 3, 99-101, doi:https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biology 19, 125, doi:10.1186/s13059-018-1486-1 (2018).
Flyamer, I. M., Illingworth, R. S. & Bickmore, W. A. Coolpup.py: versatile pile-up analysis of Hi-C data. Bioinformatics 36, 2980-2985, doi:10.1093/bioinformatics/btaa073 (2020).
Belaghzal, H. et al. Liquid chromatin Hi-C characterizes compartment-dependent chromatin interaction dynamics. Nature Genetics 53, 367-378, doi:10.1038/s41588-021-00784-4 (2021).
Song, L. & Crawford, G. E. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols 2010, pdb. prot5384 (2010).
Jin, W. et al. Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature 528, 142-146, doi:10.1038/nature15740 (2015).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods 10, 1213 (2013).
Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348, 910, doi:10.1126/science.aab1601 (2015).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490, doi:10.1038/nature14590 (2015).
Zilionis, R. et al. Single-cell barcoding and sequencing using droplet microfluidics. Nature protocols 12, 44 (2017).
Chen, X., Miragaia, R. J., Natarajan, K. N. & Teichmann, S. A. A rapid and robust method for single cell chromatin accessibility profiling. Nat Commun 9, 5345, doi:10.1038/s41467-018-07771-0 (2018).
Lareau, C. A. et al. Massively parallel single-cell mitochondrial DNA genotyping and chromatin profiling. Nature Biotechnology, 1-11 (2020).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nature methods 14, 959-962 (2017).
Lareau, C. A. et al. Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nature Biotechnology 37, 916-924, doi:10.1038/s41587-019-0147-6 (2019).
Navidi, Z., Zhang, L. & Wang, B. simATAC: a single-cell ATAC-seq simulation framework. Genome Biology 22, 74, doi:10.1186/s13059-021-02270-w (2021).
Giresi, P. G. & Lieb, J. D. Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods 48, 233-239, doi:https://doi.org/10.1016/j.ymeth.2009.03.003 (2009).
Segorbe, D. et al. An optimized FAIRE procedure for low cell numbers in yeast. Yeast 35, 507-512, doi:https://doi.org/10.1002/yea.3316 (2018).
Seuter, S., Neme, A. & Carlberg, C. in Epigenetics Methods Vol. 18 (ed Trygve Tollefsbol) 353-369 (Academic Press, 2020).
Chaithanya, K. NicE-seq: high resolution open chromatin profiling. Genome biology (2017).
Chin, H. G. et al. Universal NicE-seq for high-resolution accessible chromatin profiling for formaldehyde-fixed and FFPE tissues. Clinical Epigenetics 12, 143, doi:10.1186/s13148-020-00921-6 (2020).
Schones, D. E. et al. Dynamic Regulation of Nucleosome Positioning in the Human Genome. Cell 132, 887-898, doi:https://doi.org/10.1016/j.cell.2008.02.022 (2008).
Hu, S. e. et al. CAM: A quality control pipeline for MNase-seq data. PloS one 12, e0182771 (2017).
O’Geen, H., Echipare, L. & Farnham, P. J. in Epigenetics Protocols (ed Trygve O. Tollefsbol) 265-286 (Humana Press, 2011).
Texari, L. et al. An optimized protocol for rapid, sensitive and robust on-bead ChIP-seq from primary cells. STAR protocols 2, 100358 (2021).
Skene, P. J., Henikoff, J. G. & Henikoff, S. Targeted in situ genome-wide profiling with high efficiency for low cell numbers. Nature Protocols 13, 1006-1019, doi:10.1038/nprot.2018.015 (2018).
Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 10, 1930, doi:10.1038/s41467-019-09982-5 (2019).
Harada, A. et al. A chromatin integration labelling method enables epigenomic profiling with lower input. Nature Cell Biology 21, 287-296, doi:10.1038/s41556-018-0248-3 (2019).
Cao, Z., Chen, C., He, B., Tan, K. & Lu, C. A microfluidic device for epigenomic profiling using 100 cells. Nature Methods 12, 959-962, doi:10.1038/nmeth.3488 (2015).
Adli, M. & Bernstein, B. E. Whole-genome chromatin profiling from limited numbers of cells using nano-ChIP-seq. Nature protocols 6, 1656-1668 (2011).
Rotem, A. et al. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nature biotechnology 33, 1165-1172, doi:10.1038/nbt.3383 (2015).
Wang, Q. et al. CoBATCH for High-Throughput Single-Cell Epigenomic Profiling. Molecular Cell 76, 206-216.e207, doi:https://doi.org/10.1016/j.molcel.2019.07.015 (2019).
Rhee, Ho S. & Pugh, B. F. Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution. Cell 147, 1408-1419, doi:https://doi.org/10.1016/j.cell.2011.11.013 (2011).
Zheng, X. et al. Low-Cell-Number Epigenome Profiling Aids the Study of Lens Aging and Hematopoiesis. Cell Reports 13, 1505-1518, doi:https://doi.org/10.1016/j.celrep.2015.10.004 (2015).
Murphy, T. W., Hsieh, Y.-P., Ma, S., Zhu, Y. & Lu, C. Microfluidic Low-Input Fluidized-Bed Enabled ChIP-seq Device for Automated and Parallel Analysis of Histone Modifications. Analytical Chemistry 90, 7666-7674, doi:10.1021/acs.analchem.8b01541 (2018).
Zhu, B. et al. MOWChIP-seq for low-input and multiplexed profiling of genome-wide histone modifications. Nature Protocols 14, 3366-3394, doi:10.1038/s41596-019-0223-x (2019).
Ma, S., Hsieh, Y.-P., Ma, J. & Lu, C. Low-input and multiplexed microfluidic assay reveals epigenomic variation across cerebellum and prefrontal cortex. Science Advances 4, eaar8187, doi:10.1126/sciadv.aar8187 (2018).
Ahmed, N. et al. GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data. BMC Bioinformatics 20, 520, doi:10.1186/s12859-019-3086-9 (2019).
Daily, J. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics 17, 81, doi:10.1186/s12859-016-0930-z (2016).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Scientific Reports 9, 9354, doi:10.1038/s41598-019-45839-z (2019).
Wimberley, C. E. & Heber, S. PeakPass: Automating ChIP-Seq Blacklist Creation. Journal of Computational Biology 27, 259-268, doi:10.1089/cmb.2019.0295 (2019).
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696-1697, doi:10.1093/bioinformatics/btr189 (2011).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular cell 38, 576-589 (2010).
Wong, K.-C. MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences. Bioinformatics 33, 3028-3035, doi:10.1093/bioinformatics/btx381 (2017).
Kiesel, A. et al. The BaMM web server for de-novo motif discovery and regulatory sequence analysis. Nucleic Acids Research 46, W215-W220, doi:10.1093/nar/gky431 (2018).
Yu, Q., Wei, D. & Huo, H. SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets. BMC Bioinformatics 19, 228, doi:10.1186/s12859-018-2242-y (2018).
Bailey, T. L. STREME: Accurate and versatile sequence motif discovery. Bioinformatics, doi:10.1093/bioinformatics/btab203 (2021).
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Research 48, D87-D92, doi:10.1093/nar/gkz1001 (2020).
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic acids research 46, D252-D259, doi:10.1093/nar/gkx1106 (2018).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017-1018, doi:10.1093/bioinformatics/btr064 (2011).
Yan, F., Powell, D. R., Curtis, D. J. & Wong, N. C. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis. Genome biology 21, 22 (2020).
Chen, X., Yu, B., Carriero, N., Silva, C. & Bonneau, R. Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility. Nucleic Acids Research 45, 4315-4329, doi:10.1093/nar/gkx174 (2017).
Schmidt, F. et al. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic acids research 45, 54-66, doi:10.1093/nar/gkw1061 (2017).
Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729-736, doi:10.1038/s41586-020-2528-x (2020).
Quang, D. & Xie, X. FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 166, 40-47, doi:https://doi.org/10.1016/j.ymeth.2019.03.020 (2019).
Quach, B. & Furey, T. S. DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter. Bioinformatics 33, 956-963, doi:10.1093/bioinformatics/btw740 (2017).
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome research 21, 447-455 (2011).
Baek, S., Goldstein, I. & Hager, G. L. Bivariate genomic footprinting detects changes in transcription factor activity. Cell reports 19, 1710-1722 (2017).
Tripodi, I. J., Allen, M. A. & Dowell, R. D. Detecting Differential Transcription Factor Activity from ATAC-Seq Data. Molecules 23, 1136 (2018).
Li, H., Quang, D. & Guan, Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome research 29, 281-292 (2019).
Keilwagen, J., Posch, S. & Grau, J. Accurate prediction of cell type-specific transcription factor binding. Genome Biology 20, 9, doi:10.1186/s13059-018-1614-y (2019).
Li, Z. et al. Identification of transcription factor binding sites using ATAC-seq. Genome Biology 20, 45, doi:10.1186/s13059-019-1642-2 (2019).
Stovner, E. B. & Sætrom, P. epic2 efficiently finds diffuse domains in ChIP-seq data. Bioinformatics 35, 4392-4393, doi:10.1093/bioinformatics/btz232 (2019).
Pachkov, M. et al. ISMARA: Completely automated inference of gene regulatory networks from high-throughput data. PeerJ Preprints 5, e3328v3321 (2017).
Qin, Q. et al. Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data. Genome Biology 21, 32, doi:10.1186/s13059-020-1934-6 (2020).
Wijetunga, N. A. et al. SMITE: an R/Bioconductor package that identifies network modules by integrating genomic and epigenomic information. BMC Bioinformatics 18, 41, doi:10.1186/s12859-017-1477-3 (2017).
Moorthy, S. D. et al. Enhancers and super-enhancers have an equivalent regulatory role in embryonic stem cells through regulation of single or multiple genes. Genome research 27, 246-258, doi:10.1101/gr.210930.116 (2017).
Nowling, R. J., Geromel, R. R. & Halligan, B. in Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 1-1.
Sethi, A. et al. Supervised enhancer prediction with epigenetic pattern recognition and targeted validation. Nature Methods 17, 807-814, doi:10.1038/s41592-020-0907-8 (2020).
Abascal, F. et al. Perspectives on ENCODE. Nature 583, 693-698, doi:10.1038/s41586-020-2449-8 (2020).
Chen, L., Fish, A. E. & Capra, J. A. Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties. PLoS computational biology 14, e1006484 (2018).
Kim, S. G., Harwani, M., Grama, A. & Chaterji, S. EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm. Scientific Reports 6, 38433, doi:10.1038/srep38433 (2016).
Tobias, I. C. et al. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 64, 426-448, doi:10.1139/gen-2020-0104 (2020).
Ou, J. et al. ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data. BMC Genomics 19, 169, doi:10.1186/s12864-018-4559-3 (2018).
Lai, B. et al. Principles of nucleosome organization revealed by single-cell micrococcal nuclease sequencing. Nature 562, 281-285, doi:10.1038/s41586-018-0567-3 (2018).
Chen, K. et al. DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing. Genome research 23, 341-351 (2013).
Polishko, A., Bunnik, E. M., Le Roch, K. G. & Lonardi, S. PuFFIN - a parameter-free method to build nucleosome maps from paired-end reads. BMC Bioinformatics 15, S11, doi:10.1186/1471-2105-15-S9-S11 (2014).
Schep, A. N. et al. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome research 25, 1757-1770 (2015).
Tarbell, E. D. & Liu, T. HMMRATAC: a Hidden Markov ModeleR for ATAC-seq. Nucleic acids research 47, e91-e91 (2019).
Koh, P. W., Pierson, E. & Kundaje, A. Denoising Genome-wide Histone ChIP-seq with Convolutional Neural Networks. bioRxiv, 052118, doi:10.1101/052118 (2017).
Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature genetics 39, 311-318 (2007).
Hon, G. C., Hawkins, R. D. & Ren, B. Predictive chromatin signatures in the mammalian genome. Human Molecular Genetics 18, R195-R201, doi:10.1093/hmg/ddp409 (2009).
Levitsky, V. et al. A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package. Nucleic Acids Research 47, e139-e139, doi:10.1093/nar/gkz800 (2019).
Sun, W. et al. Histone Acetylome-wide Association Study of Autism Spectrum Disorder. Cell 167, 1385-1397.e1311, doi:https://doi.org/10.1016/j.cell.2016.10.031 (2016).
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature biotechnology 28, 817-825 (2010).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215-216, doi:10.1038/nmeth.1906 (2012).
Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature methods 9, 473 (2012).
Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Research 41, 827-841, doi:10.1093/nar/gks1284 (2013).
Yen, A. & Kellis, M. Systematic chromatin state comparison of epigenomes associated with diverse properties including sex and tissue type. Nat Commun 6, 7973, doi:10.1038/ncomms8973 (2015).
Zhang, Y., An, L., Yue, F. & Hardison, R. C. Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Research 44, 6721-6731, doi:10.1093/nar/gkw278 (2016).
Taudt, A., Nguyen, M. A., Heinig, M., Johannes, F. & Colomé-Tatché, M. chromstaR: Tracking combinatorial chromatin state dynamics in space and time. bioRxiv, 038612, doi:10.1101/038612 (2016).
Ernst, J. & Kellis, M. Chromatin-state discovery and genome annotation with ChromHMM. Nature protocols 12, 2478 (2017).
Tukiainen, T. et al. Landscape of X chromosome inactivation across human tissues. Nature 550, 244-248, doi:10.1038/nature24265 (2017).
Grosselin, K. et al. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer. Nature genetics 51, 1060-1066 (2019).
Davis, C. A. et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Research 46, D794-D801, doi:10.1093/nar/gkx1081 (2018).
Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nature Biotechnology 28, 1045-1048, doi:10.1038/nbt1010-1045 (2010).
Bujold, D. et al. The International Human Epigenome Consortium Data Portal. Cell Systems 3, 496-499.e492, doi:https://doi.org/10.1016/j.cels.2016.10.019 (2016).
Zhang, Q. et al. hTFtarget: A Comprehensive Database for Regulations of Human Transcription Factors and Their Targets. Genomics, Proteomics & Bioinformatics 18, 120-128, doi:https://doi.org/10.1016/j.gpb.2019.09.006 (2020).
Feng, C. et al. KnockTF: a comprehensive human gene expression profile database with knockdown/knockout of transcription factors. Nucleic Acids Research 48, D93-D100, doi:10.1093/nar/gkz881 (2020).
Li, Y. et al. TRlnc: a comprehensive database for human transcriptional regulatory information of lncRNAs. Briefings in Bioinformatics 22, 1929-1939, doi:10.1093/bib/bbaa011 (2021).
Bai, X. et al. ENdb: a manually curated database of experimentally supported enhancers for human and mouse. Nucleic Acids Research 48, D51-D57, doi:10.1093/nar/gkz973 (2020).
Qian, F.-C. et al. SEanalysis: a web tool for super-enhancer associated regulatory analysis. Nucleic Acids Research 47, W248-W255, doi:10.1093/nar/gkz302 (2019).
Wang, F. et al. ATACdb: a comprehensive human chromatin accessibility database. Nucleic Acids Research 49, D55-D64 (2021).
Mei, S. et al. Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse. Nucleic Acids Research 45, D658-D662, doi:10.1093/nar/gkw983 (2017).
Zheng, R. et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Research 47, D729-D735, doi:10.1093/nar/gky1094 (2019).
Su, Y. et al. Neuronal activity modifies the chromatin accessibility landscape in the adult brain. Nature Neuroscience 20, 476-483, doi:10.1038/nn.4494 (2017).
Tyssowski, K. M. et al. Different Neuronal Activity Patterns Induce Different Gene Expression Programs. Neuron 98, 530-546.e511, doi:https://doi.org/10.1016/j.neuron.2018.04.001 (2018).
Cao, J. et al. A human cell atlas of fetal gene expression. Science 370 (2020).
Friedman, B. A. et al. Diverse Brain Myeloid Expression Profiles Reveal Distinct Microglial Activation States and Aspects of Alzheimer’s Disease Not Evident in Mouse Models. Cell Reports 22, 832-847, doi:https://doi.org/10.1016/j.celrep.2017.12.066 (2018).
Pastore, A. et al. Corrupted coordination of epigenetic modifications leads to diverging chromatin states and transcriptional heterogeneity in CLL. Nat Commun 10, 1874, doi:10.1038/s41467-019-09645-5 (2019).
Arruabarrena-Aristorena, A. et al. FOXA1 mutations reveal distinct chromatin profiles and influence therapeutic response in breast cancer. Cancer Cell 38, 534-550. e539 (2020).
de Nigris, F., Ruosi, C. & Napoli, C. Clinical efficiency of epigenetic drugs therapy in bone malignancies. Bone 143, 115605, doi:https://doi.org/10.1016/j.bone.2020.115605 (2021).
Xing, Q. R. et al. Parallel bimodal single-cell sequencing of transcriptome and chromatin accessibility. Genome research 30, 1027-1039 (2020).
Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370 (2020).

Chromatin structure and arrangement. Image modified obtained from 7↩︎
Figure 2. Motif discovery process from CHIP-seq, SELEX. Own elaboration, created in Biorender.↩︎
Figure 3. Motif identification from Chromatin accessibility datasets. Own elaboration, created in Biorender.↩︎
Figure 4. representation of V- plot graph. It illustrates an arrangement in which two pairs of nucleosomes delimit a chromatin open region truncated by a TF and the order of the fragments in that situation. B represent the characteristic V-plot pattern chart where 0 corresponds to the TFBS. Obtained from 1↩︎
Figure 5. Representation of chromatin states according to their representative marks. Image obtained from 36 unedited.↩︎

Deciphering transcription factors by modeling chromatin datasets

Dayna A. Rivera Devit

2021-06-05