An introduction to NEAT

Mohieddin Jafari
06 March, 2019

What's NEAT?

NEAT is the R package that implements NEAT, the Network Enrichment Analysis Test which is published in Signorelli, M. et al. BMC Bioinformatics, 17:352, 2016.

The article is freely available from the website of BMC Bioinformatics.

Gene Enrichment Analysis

Singular enrichment analysis (SEA) tests the over or underrepresentation of functional gene sets within the set of genes.
Gene set enrichment analysis (GSEA) makes use of the whole list of genes ranked by expression value, and tests the tendency of genes belonging to a functional set to occupy positions at the top (or at the bottom) of list.
Network enrichment analysis test (NEAT) incorporates any related networks, with their information on gene dependences, into gene enrichment tests. The purpose of network enrichment analysis is thus to integrate GEA tests with information on known relations between genes, represented by means of a network.

Workflow

A quick look at an example

The analysis will typically consist of three steps: preparation of the data, computation of the test and inspection of the results.

data(yeast) # load the data
ls(yeast)

[1] "esr1"       "esr2"       "goslimproc" "kegg"       "yeastnet"  
[6] "ynetgenes"

yeast$esr1 and yeast$esr2, sets of differentially expressed genes.
yeast$goslimproc, some functional gene sets.

induced_genes <- list('ESR 2' = yeast$esr2) 
# set of differentially expressed genes

functional_sets <- yeast$goslimproc[c(72, 75, 80)] 
# Three functional gene sets of interest

Two further objects are needed:

yeast$yeastnet, a two-column matrix that contains YeastNet (edge list);
yeast$ynetgenes, a vector containing the names of all the genes that are present in the network (node list)

Computation of the test

Idea behind NEAT:

If two gene sets are related, then in the network we expect them to be connected by a larger (or smaller) number of links than we would expect to observe by chance.

The core function neat is for the computation part:

test <- neat(alist = induced_genes, blist = functional_sets, network = yeast$yeastnet, nettype = 'undirected', nodes = yeast$ynetgenes, alpha = 0.01) %>% tbl_df

Analysis of the results

The results are now saved in the object test:

test

# A tibble: 3 x 6
  A     B                        nab expected_nab   pvalue conclusion     
  <fct> <fct>                  <dbl>        <dbl>    <dbl> <fct>          
1 ESR 2 response_to_heat          86         96.9 2.52e- 1 No enrichment  
2 ESR 2 response_to_starvation   459        331.  5.72e-12 Overenrichment 
3 ESR 2 RNA_catabolic_process    108        198.  5.78e-13 Underenrichment

neat(alist, blist, network, nettype, nodes, alpha = NULL, 
anames = NULL, bnames = NULL)

The fundamental arguments of the function are:

alist and blist, two lists of gene sets;
network, which can be specified in three different formats:
- a sparse adjacency matrix,
- an igraph graph,
- an edge list in a two-column matrix;

nettype, either 'undirected' or 'directed';
nodes, a vector containing the names of all nodes in the network;
alpha, which allows to specify the significance level of the test.

Performance Evaluation

Table 1

Table 2

Let's Discuss

I think

We need to create or assemble a more comperhensive tool

which contains all advantages of SEA, GSEA and NEAT.

What do you think?