Mohieddin Jafari
06 March, 2019
NEAT is the R package that implements NEAT, the Network Enrichment Analysis Test which is published in Signorelli, M. et al. BMC Bioinformatics, 17:352, 2016.
The article is freely available from the website of BMC Bioinformatics.
Singular enrichment analysis (SEA) tests the over or underrepresentation of functional gene sets within the set of genes.
Gene set enrichment analysis (GSEA) makes use of the whole list of genes ranked by expression value, and tests the tendency of genes belonging to a functional set to occupy positions at the top (or at the bottom) of list.
Network enrichment analysis test (NEAT) incorporates any related networks, with their information on gene dependences, into gene enrichment tests. The purpose of network enrichment analysis is thus to integrate GEA tests with information on known relations between genes, represented by means of a network.
The analysis will typically consist of three steps: preparation of the data, computation of the test and inspection of the results.
data(yeast) # load the data
ls(yeast)
[1] "esr1" "esr2" "goslimproc" "kegg" "yeastnet"
[6] "ynetgenes"
yeast$esr1
and yeast$esr2
, sets of differentially expressed genes.
yeast$goslimproc
, some functional gene sets.
induced_genes <- list('ESR 2' = yeast$esr2)
# set of differentially expressed genes
functional_sets <- yeast$goslimproc[c(72, 75, 80)]
# Three functional gene sets of interest
Two further objects are needed:
yeast$yeastnet
, a two-column matrix that contains YeastNet (edge list);yeast$ynetgenes
, a vector containing the names of all the genes that are present in the network (node list)Idea behind NEAT:
If two gene sets are related, then in the network we expect them to be connected by a larger (or smaller) number of links than we would expect to observe by chance.
The core function neat
is for the computation part:
test <- neat(alist = induced_genes, blist = functional_sets, network = yeast$yeastnet, nettype = 'undirected', nodes = yeast$ynetgenes, alpha = 0.01) %>% tbl_df
The results are now saved in the object test
:
test
# A tibble: 3 x 6
A B nab expected_nab pvalue conclusion
<fct> <fct> <dbl> <dbl> <dbl> <fct>
1 ESR 2 response_to_heat 86 96.9 2.52e- 1 No enrichment
2 ESR 2 response_to_starvation 459 331. 5.72e-12 Overenrichment
3 ESR 2 RNA_catabolic_process 108 198. 5.78e-13 Underenrichment
neat(alist, blist, network, nettype, nodes, alpha = NULL,
anames = NULL, bnames = NULL)
The fundamental arguments of the function are:
alist
and blist
, two lists of gene sets;
network
, which can be specified in three different formats:
nettype
, either 'undirected'
or 'directed'
;
nodes
, a vector containing the names of all nodes in the network;
alpha
, which allows to specify the significance level of the test.
I think
We need to create or assemble a more comperhensive tool
which contains all advantages of SEA, GSEA and NEAT.
What do you think?