Microarrays are one of the most popular tools to understand biological phenomenon by large-scale measurements of biological samples, typically DNA, RNA, or proteins. The technique has been used for a range of purposes in life science research, ranging from gene expression profiling to SNP or other biomarker identification, and further, to understand relations between genes and their activities on a large scale. Likewise, there are different techniques in use to produce these arrays, namely, Affymetrix, Illumina, and so on. In this chapter, we mainly focus on microarrays to measure gene expression with nucleic acid samples and use Affymetrix CEL file data for explanations. Nevertheless, most of the techniques can be used on other platforms with slight modifications.
Download GSE24460 from NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24460). This gives you a file named GSE24460_RAW.tar in your desired directory.
source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
biocLite("affy")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) ‘affy’
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/affy_1.52.0.tgz'
Content type 'application/x-gzip' length 1409532 bytes (1.3 MB)
==================================================
downloaded 1.3 MB
The downloaded binary packages are in
/var/folders/96/7b7c46zx7v1fqqrsmw_0btx40000gn/T//Rtmp3HuCma/downloaded_packages
library(affy)
To read all the files in the directory, use the ReadAffy function as follows:
getwd()
[1] "/Users/Bill/Documents/GitHub/UsefulCodes/R"
myData <- ReadAffy(celfile.path="../data/Bioinfor_R/GSE24460_RAW/")
myData
AffyBatch object
size of arrays=732x732 features (18 kb)
cdf=HG-U133A_2 (22277 affyids)
number of samples=4
number of genes=22277
annotation=hgu133a2
notes=
If you wish to read only one or a couple of files, specify the filename as follows:
myData1 <- ReadAffy(filenames = filenames.of.your.data)
The ExpressionSet class in Bioconductor represents a combination of several different sources of information into one data structure. For an array, it contains the intensities, phenotype data, and experiment information as well as annotation information. When we read a set of CEL files using the ReadAffy or read.affyBatch function, an AffyBatch object is created that extends the ExpressionSet structure. The AffyBatch object is probe-level data, whereas ExpressionSet is probeset-level data, which is extended to a probe level by AffyBatch. We must create an ExpressionSet object from these individual files from scratch to facilitate the analysis work. This recipe will present the solution to this problem. This can be done with any platform, be it Affymetrix or Illumina.
biocLite("Biobase")
Error: could not find function "biocLite"
As a demo expression file and the phenotypic data (pData) file, we will use the built-in data for the Biobase library, whose location can be fetched as follows:
exprsLoc
[1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/Biobase/extdata/exprsData.txt"
Read the table from the text file that contains the expression values using the usual read.table or read.csv function as follows:
dim(exprs)
[1] 500 26
The expression data is a matrix that contains the intensities measured, whereas the phenotypic data carries information about the conditions (for example, control or disease) of the data and samples
Now, read the phenotype information file in a similar way using the read.csv function as follows:
validObject(exampleSet)
[1] TRUE
use the AffyBatch object created in the previous recipe (the myData object from the Reading CEL files recipe).
sampleNames(myData)
[1] "GSM71019.CEL" "GSM71020.CEL" "GSM71021.CEL" "GSM71022.CEL" "GSM71023.CEL" "GSM71024.CEL" "GSM71025.CEL" "GSM71026.CEL"
Quality-related problems could stem from hybridization due to uneven fluorescence on the chip that causes variable intensity distributions. A nonspecific binding or other biological/technical reasons can create background noise in the data. Another possible situation can be an inappropriate experimental design that may affect the dataset as a whole.
biocLite("arrayQualityMetrics")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) ‘arrayQualityMetrics’
also installing the dependencies ‘base64’, ‘gcrma’, ‘BeadDataPackR’, ‘illuminaio’, ‘annotate’, ‘affyPLM’, ‘beadarray’, ‘Cairo’, ‘genefilter’, ‘gridSVG’, ‘hwriter’, ‘setRNG’, ‘SVGAnnotation’, ‘vsn’
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/base64_2.0.tgz'
Content type 'application/x-gzip' length 57198 bytes (55 KB)
==================================================
downloaded 55 KB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/gcrma_2.46.0.tgz'
Content type 'application/x-gzip' length 239593 bytes (233 KB)
==================================================
downloaded 233 KB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/BeadDataPackR_1.26.0.tgz'
Content type 'application/x-gzip' length 206229 bytes (201 KB)
==================================================
downloaded 201 KB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/illuminaio_0.16.0.tgz'
Content type 'application/x-gzip' length 302811 bytes (295 KB)
==================================================
downloaded 295 KB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/annotate_1.52.1.tgz'
Content type 'application/x-gzip' length 1968903 bytes (1.9 MB)
==================================================
downloaded 1.9 MB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/affyPLM_1.50.0.tgz'
Content type 'application/x-gzip' length 3770612 bytes (3.6 MB)
==================================================
downloaded 3.6 MB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/beadarray_2.24.0.tgz'
Content type 'application/x-gzip' length 4278312 bytes (4.1 MB)
==================================================
downloaded 4.1 MB
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/Cairo_1.5-9.tgz'
Content type 'application/x-gzip' length 3652546 bytes (3.5 MB)
==================================================
downloaded 3.5 MB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/genefilter_1.56.0.tgz'
Content type 'application/x-gzip' length 1661375 bytes (1.6 MB)
==================================================
downloaded 1.6 MB
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/gridSVG_1.5-0.tgz'
Content type 'application/x-gzip' length 694559 bytes (678 KB)
==================================================
downloaded 678 KB
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/hwriter_1.3.2.tgz'
Content type 'application/x-gzip' length 135654 bytes (132 KB)
==================================================
downloaded 132 KB
trying URL 'https://cran.rstudio.com/bin/macosx/mavericks/contrib/3.3/setRNG_2013.9-1.tgz'
Content type 'application/x-gzip' length 79885 bytes (78 KB)
==================================================
downloaded 78 KB
trying URL 'https://bioconductor.org/packages/3.4/extra/bin/macosx/mavericks/contrib/3.3/SVGAnnotation_0.93-1.tgz'
Content type 'application/x-gzip' length 2702672 bytes (2.6 MB)
==================================================
downloaded 2.6 MB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/vsn_3.42.3.tgz'
Content type 'application/x-gzip' length 3959665 bytes (3.8 MB)
==================================================
downloaded 3.8 MB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/arrayQualityMetrics_3.30.0.tgz'
Content type 'application/x-gzip' length 503693 bytes (491 KB)
==================================================
downloaded 491 KB
The downloaded binary packages are in
/var/folders/96/7b7c46zx7v1fqqrsmw_0btx40000gn/T//Rtmp3HuCma/downloaded_packages
Use the following arrayQualityMetrics function to create plots to assess the data quality:
arrayQualityMetrics(myData, outdir="quality_assesment", force = T)
The report will be written into directory 'quality_assesment'.
The checks include measuring between arrays distances, Principal Component Analysis (PCA), density plots, MA plots, and RNA degradation plots.
Go to the created subdirectory (in your case, quality_assesment) in the current working directory to check the created HTML page (index.html) and plots.
browseURL(file.path("quality_assesment", "index.html"))
You can create these plots and assessments individually as well. For example, to create an MA plot, use the MAplot function as follows:
The MA plot, also referred as a derivation from the Bland Altman plot, is based on two components, namely, M and A. The M component represents the ratio between two channels (or two arrays), thus giving an indication which color is binding more at a given spot. The A component is a measure of the log2 transformed intensity at the spot. Plotting these two components in two dimensions (usually M along the y axis and A along the x axis) gives us an idea about the intensity bias in the data. This means the differences in two channels or two arrays can be helpful in detecting background or outliers such as phenomena in the data. For instance, we can use them for the pairwise comparison of arrays or to compare the intensities of two dyes in two-channel data. A deviation from the M=0 line (asymmetrical distribution along M=0) indicates intensity bias, outliers, or even differentially expressed (DE) genes. The deviation in the plot can be corrected to some extent by normalization. (In fact, it is often used as an indicator for normalization together with the boxplot, which will be discussed later.) Any trend in the left side (lower A) indicates the presence of background, while a trend on the right (higher A) depicts saturation.
Most of the spots in the plot are usually around the M=0 line with small interquartile ranges across the arrays and probably represent non-differentially expressed (non DE) genes. However, one should reach such conclusions only after background correction and normalization.
To plot the log densities, type in the following command:
plotDensity.AffyBatch(myData)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘indexProbes’ for signature ‘"list", "character"’
The next type of plot we discussed is the intensity plot that shows the density estimates of the data. Decent data has a similar shape and range across the arrays. This plot gives the different types of information in the data. A data with high background noise will shift the entire distribution towards the right. If the distribution shows a diminished right tail, it indicates signal loss. The upper bulge of the distribution indicates signal saturation. A multimodal shape (that is, more than one mode has peaked) in the plot accounts for a spatial artifact (such as regional bias in the array).
To create the boxplots, simply use the boxplot function in the AffyBatch object as follows:
boxplot(myData)
replacing previous import ‘AnnotationDbi::tail’ by ‘utils::tail’ when loading ‘hgu133a2cdf’replacing previous import ‘AnnotationDbi::head’ by ‘utils::head’ when loading ‘hgu133a2cdf’
The boxplots of high-quality data show similar width and positions, and represent the distribution of signal intensities in the data. The distribution is usually done on the log scale to make the plot readable. A major deviation in the boxplot might represent an experimental flaw or noise in that particular array. A Kolmogorov Smirnov (KS) statistic on these distributions is used to detect outliers in the data. Another important feature of the boxplot is that deviation can be overcome by the normalization of data most of the time, which will be covered in upcoming recipes.
To get the RNA degradation plot, use the AffyRNAdeg function and then plotAffyRNAdeg as follows:
The RNA degradation plot gives an indication for the quality of samples used in array hybridization. Generally, mRNAs have a certain lifespan, after which they are degraded and hence not effective to measure the expression levels. The degradation starts at the 5’ end moving towards the 3’ end. Therefore, as an effect of this degradation, the intensities should be lower at the 5’ end compared to the 3’ end. The expression (intensity measurement) of all the probes on an array gives the level of degradation in the sample. We represent this as an RNA degradation plot, where probes are numbered sequentially from the 5’ end to the 3’ end of the molecule. So, plotting the intensities should show an upward trend along the probe numbers (more degradation at the 5’ end, hence low intensity, and vice versa). In this recipe, we checked whether the lines in the plot are following a consistent trend. A deviation indicates issues with the sample used for hybridization.
Check the details of the rnaDeg object as follows:
summaryAffyRNAdeg(rnaDeg)
GSM602658_MCF71.CEL.gz GSM602659_MCF72.CEL.gz GSM602660_MCF7226ng.CEL.gz GSM602661_MCF7262ng.CEL.gz
slope 2.19e+00 2.22e+00 2.73e+00 1.74e+00
pvalue 2.87e-13 5.01e-13 3.40e-11 1.51e-06
The checks include measuring between arrays distances, Principal Component Analysis (PCA), density plots, MA plots, and RNA degradation plots.
\[d_{ij}=mean\left|I_{ia}-I_{ja}\right|\]
In the preceding formula, Iia and Ija are the intensity measurements for the ath probe in thearrays i and j, respectively.
Developing new methods of analyzing expression data requires proper testing and performance checks on large, high-quality datasets obtained from many experimental conditions. This requires benchmark data with known parameters. Such data from experiments is usually not available, and performing such experiments in a wet lab is not economical. Therefore, generating well-characterized, synthetic datasets that allow thorough testing of learning algorithms in a fast and reproducible manner is needed. It is common practice to use simulated data (artificial) sets for such purposes. This recipe will present the approach to generate such datasets.
generate a dataset of 35,000 genes with 1 percent of DE genes. In order to do so, perform the following steps:
The madsim package generates data for two biological conditions when the characteristics are known in terms of statistical parameters. The madsim function uses a beta distribution to generate n values between 0 and 1.
They use the following four components to generate the expression values of a gene: * The expression levels of non-DE genes * The expression levels of DE genes * Noise * Technical noise
Microarrays are high-throughput methods that measure the expression levels of thousands of genes simultaneously. Each sample receives different conditions. A small difference in RNA quantities or/and experimental errors may cause the intensity level to vary from one replicate to the other. Handling this inherent problem requires the normalization of data. This minimizes the technical effects, rendering the data comparable. This recipe will explore a few of the many normalization methods developed for data normalization in R.
normalize.AffyBatch.methods()
Error: could not find function "normalize.AffyBatch.methods"
Variance Stabilization and Normalization (VSN) is based on the assumption that the variance of microarray data depends on the signal intensity and there exists a transformation that keeps the variance approximately constant. This means that the vsn method finds a transformation of the intensity measures in the data so as to keep the variance of intensity approximately independent of its mean. The normalize.AffyBatch.vsn function is actually a wrapper for the vsn function in the vsn library (not the affy library).
The second method, the loess normalization, uses a locally weighted regression to normalize the data. The method fits a smoothing curve to a dataset. The degree of smoothing is determined by the window width parameter. A larger window width results in a smoother curve, and a smaller window results in a more local variation. The normalize.AffyBatch.loess function actually uses the loess function of R to fit and smooth the data. The window size used by default is 2/3, but can be modified with the span argument.
the quantile normalization uses a simpler concept of adjusting the quantiles of the distribution in an array to make all the quantiles alike and a common median center. This makes the histograms of the arrays look alike.
Batch effects are the systematic errors caused when samples are processed in different batches. They represent the nonbiological differences between the samples in an experiment. The reason can be the difference in sample preparation or hybridization protocol, and so on. It can be reduced, to some extent, by careful, experimental design but cannot be eliminated completely unless the study is performed under a single batch. Batch effects render the task of combining data from different batches difficult. This ultimately reduces the power of statistical analysis of the data. This needs appropriate preprocessing before the batches are combined. This recipe will present these preprocessing techniques.
myData
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 8 samples
element names: exprs, se.exprs
protocolData: none
phenoData
sampleNames: GSM71019.CEL GSM71020.CEL ... GSM71026.CEL (8 total)
varLabels: sample outcome batch cancer
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133a
The data we used originates from a bladder cancer study, where expression profiling was used to examine the gene expression patterns in superficial transitional cell carcinoma (STCC) with and without surrounding carcinoma in situ (CIS). The data was produced on different dates and, hence, typically shows substantial batch effects; these can lead to confusing or incorrect biological conclusions, owing to the influence of technical artifacts.
The function accepts the data matrix that contains the expression values, the batch number for each sample in the input data, and the model matrix that represents the sample information. For other optional input arguments, type in ?ComBat in your R console.
ComBat allows users to adjust for batch effects in datasets where the batch covariate is known, using methodology described in Johnson et al. 2007. It uses either parametric or non-parametric empirical Bayes frameworks for adjusting data for batch effects. Users are returned an expression matrix that has been corrected for batch effects. The input data are assumed to be cleaned and normalized before batch effect removal.
The ComBat function uses an empirical Bayes method to combine the batches. It estimates parameters for the location and scale adjustment (LS) of each batch for each gene independently. Pooling information for multiple genes in each batch brings out the pattern that genes with similar expressions follow. This information is then used to adjust the batches in order to eliminate the batch effect. The patterns in the heatmap may show two factors: the intended biological effect or unintended batch effects.
The batches can be observed in the heatmap and the clustering tree. We can see the branching as well as distinct color patterns in the heatmap for every batch, while after running the ComBat function some of these branches in clustering trees merge. The following heatmaps shows the clustering tree with separate (A) and merged (B) batches, which has been boxed externally (not a part of R code) for visual emphasis:
We measure approximately 20,000 human genes in 10 samples and get a matrix of 20,000 x 10 measurements. we need to transform the multidimensional cloud in lower dimensions to explain and graphically represent the patterns in the data. Organizing and combining the features in order to explain the maximum variability in the data can help achieve this. Principal Components Analysis (PCA) is a method that achieves this by performing a covariance analysis between factors. This finds the orthogonal components that represent the data and each component (called principal components) that represents the dimension where the features are more extended.
The PCA computation via the prcomp function performs the principal component analysis on the data matrix. It returns the principal components, their standard deviations (the square roots of Eigen values), and the rotation (containing the Eigen vectors). The following screenshot shows how the data appears when viewed along the selected pairs of principal components:
At the genome level, the content of every cell is the same, which means that similar genes are present (with a few exceptions) in similar cells. The question that arises then is, what makes cells (for example, control and treated samples) different from one another? This is the question we have most of the time while doing microarray-based experiments. The concept of differential gene expression is the answer to the question. It is well established that only a fraction of a genome is expressed in each cell, and this phenomenon of selective expression of genes based on cell types is the baseline behind the concept of differential gene expression. Thus, it is important to find which genes show differential expression in a particular cell. This is achieved by comparing the cell under study with a reference, usually called control. This recipe will explain how to find the DE genes for a cell based on the expression levels of the control and treatment cells.
This recipe requires the normalized expression data for treatment and control samples. More number of replicates is always statistically relevant for such analytical purposes. It must be noted that we always use normalized data for any differential expression analysis. As mentioned earlier, normalization makes the array comparable, and hence, using such transformed data to find differences makes the process unbiased and scientifically rational. In this recipe, we will use the quantile normalized data. Besides this, we need the experiment and phenotype details, which are part of the affyBatch or ExpressionSet object. We will also introduce the R library, limma, that houses one of the most popular methods in R for differential gene expression analysis. For demonstration, we will use normal colon cancer preprocessed affy data from the antiProfilesData package.
myData_quantile
ExpressionSet (storageMode: lockedEnvironment)
assayData: 5339 features, 16 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM95473 GSM95474 ... GSM95488 (16 total)
varLabels: filename DB_ID ... Status (7 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu133plus2
The limma library is used for analyzing gene expression microarray data, especially the use of linear models for analyzing gene expression data. It implements several methods of linear modeling for microarray data that can be used to identify DE genes. The approach described in this recipe first fits a linear model for each gene in the data given a set of arrays. Thereafter, it uses an empirical Bayes method to assess differential expression. This computes the statistical test and corresponding score in the form of p-values, log fold change, and so on.
The input to the lmFit function that we used was the expression data (a matrix) and the design matrix. However, the input can also be ExpressionSet. Another important thing that must be mentioned is that we assigned the designed matrix here manually to illustrate the knowledge based on data. Nevertheless, the design matrix can also be directly created from the phenotype data as follows:
design
factor(pData(myData_quantile)$Status)0 factor(pData(myData_quantile)$Status)1
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 0 1
10 0 1
11 0 1
12 0 1
13 0 1
14 0 1
15 0 1
16 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(pData(myData_quantile)$Status)`
[1] "contr.treatment"
The design matrix describes the experiment condition in each of its column. In our case, we have only two conditions, and hence, the single column design matrix with appropriate experiment indication works as well (the type in design in your R terminal). To learn more about creating design matrices, type in ?model.matrix in the R terminal. To use multiple comparisons among different conditions, the makeContrasts function can be used (type in ?makeContrasts). The contrast matrix can be used in the same way as the design matrix.
There are several other packages for differential gene expression analysis.
library(EMA)
################################################################################
Easy Microarray Analysis
EMA stable version
Current release : v1.4.4 - march 2014
################################################################################
we have seen analyses with two experimental groups, namely, treatment and control. However, there are experimental designs where we may need to compare more than two groups as well. To illustrate, let’s consider a situation where we have three conditions and we need to compare them systematically against each other. This recipe will explain such a situation
For this recipe, we will use another dataset from the leukemiasEset package. The data is from 60 bone marrow samples of patients with one of the four main types of leukemia (ALL, AML, CLL, and CML) and non-leukemia controls. However, for demonstration purposes, we will use only three samples from each of these categories.
dim(DE2)
[1] 252 6
the use of time as a treatment is among the popular methods. A cell sample is given a certain treatment, and its expression can change along the course stem cell or embryonic development, the expression of genes at different time points will vary. Handling such time course expression data, though not
protocol described earlier, needs small modifications in our recipe.
The working of the method is very similar to the one in the static data shown in the previous recipe. The only difference here is that we use the time factor while creating the design and contrast matrices. Such data can also be analyzed using the design matrix only in simple conditions and complete data. The following plot shows the log fold change for the yeast data along the time points for first six genes showing oscillations:
Fold change refers to the ratio of final value to initial value. In terms of gene expression, it can be defined as the ratio of the final quantification of mRNA to the initial content. The initial and final stages can be the time points or treatment and control conditions. It represents the change rather than an ambiguous absolute quantity. It has been suggested that while extracting DE genes from a dataset, fold changes can serve as more reproducible identifiers. This recipe will explain the use of fold changes for such purposes.
The working of the preceding code is straightforward and self-explanatory. The log-fold changes are computed based on the final (treatment) and initial (control) values. The log used is to the base of 2. The volcano plot simply creates the plot of log fold change to log odds in the data. The plotting is a simple scatter plot with log fold changes along the x axis and –log(p-values) along the y axis. Transforming the p-values into a log scale gives better resolution for visualization. The above plot shows the log fold change versus the log p-value plot:
Once we know the DE genes from our array data, we have all the genes that somehow play a role in the cell. In order to know more about this set of genes at a biological level, we need to know their biological role in terms of their function. Analyzing the GO categories in the set can do this. This recipe is about the enrichment of gene sets with GO terms.
# First, install and load the annotation database and GOstats library as follows:
source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
biocLite(c("hgu95av2.db", "GOstats" ,"Rgraphviz"))
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) ‘hgu95av2.db’, ‘GOstats’, ‘Rgraphviz’
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/GOstats_2.40.0.tgz'
Content type 'application/x-gzip' length 1944605 bytes (1.9 MB)
==================================================
downloaded 1.9 MB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/Rgraphviz_2.18.0.tgz'
Content type 'application/x-gzip' length 1386402 bytes (1.3 MB)
==================================================
downloaded 1.3 MB
The downloaded binary packages are in
/var/folders/96/7b7c46zx7v1fqqrsmw_0btx40000gn/T//Rtmph2tWQK/downloaded_packages
installing the source package ‘hgu95av2.db’
trying URL 'https://bioconductor.org/packages/3.4/data/annotation/src/contrib/hgu95av2.db_3.2.3.tar.gz'
Content type 'application/x-gzip' length 502594 bytes (490 KB)
==================================================
downloaded 490 KB
* installing *source* package ‘hgu95av2.db’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (hgu95av2.db)
The downloaded source packages are in
‘/private/var/folders/96/7b7c46zx7v1fqqrsmw_0btx40000gn/T/Rtmph2tWQK/downloaded_packages’
library(hgu95av2.db)
library(GOstats)
library(biomaRt)
# Prepare the input data from the results of the leukemia data analysis (Working with
# the data of multiple classes recipe). Create two sets, one that consists of all the
# genes in the data and the other that consists of DE genes, as follows:
all_genes <- rownames(tested2)
sel_genes <- rownames(DE2)
# Map these sets to their Entrez IDs as follows:
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl")) # set the mart
all_genes <- c(getBM(filters= "ensembl_gene_id", attributes=c("entrezgene"), values= all_genes, mart= mart)) # get entrez ids for all genes
sel_genes <- c(getBM(filters= "ensembl_gene_id", attributes=c("entrezgene"), values= sel_genes, mart= mart)) # get entrez ids for DE
# Now, define a cutoff for the test statistics as follows:
hgCutoff <- 0.05
# The next thing you need is a GOHyperGParams object that will be used as an
# input parameter for the enrichment computations. It can be computed with the
# following function:
params <- new("GOHyperGParams", geneIds=sel_genes, universeGeneIds= all_genes, annotation="hgu95av2.db", ontology="BP", pvalueCutoff=hgCutoff, conditional=FALSE, testDirection="over")
converting geneIds from list to atomic vector via unlistconverting univ from list to atomic vector via unlist
# Once you have your GOHyperGParams object, perform a hypergeometric
# test to get the p-value for the GO annotations as follows:
hgOver <- hyperGTest(params)
# Check the summary of the object that was created by typing the following command:
summary(hgOver)
# Get the number of genes associated with the different categories as follows:
geneCounts(hgOver)
GO:0046501 GO:0006779 GO:0006778 GO:0033014 GO:0042168 GO:0006782 GO:0006783 GO:0033013 GO:0051188 GO:1990267 GO:0042440 GO:0019755 GO:0046148
6 7 8 7 7 5 6 8 10 10 7 4 6
GO:0055080 GO:0050801 GO:0098771 GO:0055065 GO:0010038 GO:0051597 GO:0044772 GO:0055082 GO:0048821 GO:0030218 GO:0051261 GO:0032536 GO:0000086
19 20 19 17 13 3 15 19 4 7 6 3 9
GO:0044770 GO:0051494 GO:1901880 GO:0035378 GO:0070541 GO:0046688 GO:0017085 GO:0044839 GO:0034101 GO:0043242 GO:0051693 GO:0006873 GO:0032272
15 7 5 2 2 4 3 9 7 5 4 16 5
GO:0030835 GO:0055015 GO:0055072 GO:0048878 GO:0010039 GO:0055002 GO:1901879 GO:0055076 GO:0015701 GO:0051186 GO:0010961 GO:0032535 GO:0030003
4 3 5 23 4 7 5 6 4 11 2 11 15
GO:0019725 GO:0006638 GO:0010288 GO:0002262 GO:0030834 GO:0071248 GO:0051983 GO:0010639 GO:0055001 GO:0030071 GO:0006833 GO:0055012 GO:0007091
19 6 3 7 4 7 5 9 7 4 3 3 4
GO:0010965 GO:0030837 GO:1902099 GO:0007079 GO:0015670 GO:0031133 GO:0071918 GO:0043244 GO:0030042 GO:0044784 GO:0051306 GO:0042592 GO:0090066
4 4 4 2 2 2 2 5 4 4 4 29 13
GO:0043624 GO:0008608 GO:0007052 GO:0051303 GO:0050000 GO:0006875 GO:0000278 GO:0015840 GO:0042493 GO:0042044 GO:0033047 GO:0007088 GO:0071241
6 3 4 4 4 13 20 2 13 3 4 6 7
GO:0045931 GO:0006641 GO:0071158 GO:0000212 GO:0003062 GO:0003097 GO:0042908 GO:0051305 GO:0051988 GO:0071280 GO:0072488 GO:0061515 GO:0046685
6 5 5 2 2 2 2 2 2 2 2 4 3
GO:0051304 GO:0033045 GO:0090068 GO:1903047 GO:0007100 GO:0043249 GO:0051299 GO:0051146 GO:0007010 GO:0006639 GO:0031333 GO:0048872 GO:0051301
4 4 8 18 2 2 2 8 20 5 5 8 12
GO:0010035 GO:0007346 GO:0006977 GO:0072413 GO:0072431 GO:1902400 GO:1902402 GO:1902403 GO:0010960 GO:0015793 GO:0051382 GO:0007093 GO:0051783
13 12 4 4 4 4 4 4 2 2 2 6 6
GO:0072401 GO:0072422 GO:0071156 GO:0051310 GO:0009636 GO:0051258 GO:0072395 GO:0000075 GO:0009992 GO:0051004 GO:0000070 GO:0007586 GO:0031571
4 4 5 3 7 7 4 7 2 2 5 5 4
GO:0044783 GO:0044819 GO:0045787 GO:0032846 GO:0046916 GO:0051493 GO:0015791 GO:0035404 GO:0046689 GO:0051383 GO:0033044 GO:0050853 GO:0000819
4 4 9 7 4 10 2 2 2 2 5 3 6
GO:0008361 GO:0043241 GO:0003091 GO:0006879 GO:1902589 GO:0015669 GO:0015695 GO:0030097 GO:0016572 GO:0030099 GO:0032984 GO:0015696 GO:0045717
6 6 3 3 24 2 2 15 3 9 6 2 2
GO:0051642 GO:0051984 GO:0000741 GO:0003065 GO:0006797 GO:0007056 GO:0007057 GO:0007344 GO:0010868 GO:0010900 GO:0015881 GO:0021503 GO:0033326
2 2 1 1 1 1 1 1 1 1 1 1 1
GO:0035227 GO:0035229 GO:0035377 GO:0045196 GO:0045200 GO:0046680 GO:0046901 GO:0048203 GO:0051585 GO:0051620 GO:0051621 GO:0051622 GO:0051661
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051945 GO:0060217 GO:0060375 GO:0070494 GO:0070495 GO:0071283 GO:0071284 GO:0085018 GO:0099607 GO:0100024 GO:1900195 GO:1900402 GO:1901731
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1902303 GO:1902598 GO:1902861 GO:1902957 GO:1903126 GO:1903282 GO:1903284 GO:1903285 GO:1903892 GO:1905447 GO:2000468 GO:2000470 GO:2000775
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0033043 GO:0010389 GO:0046785 GO:0051055 GO:1901990 GO:0020027 GO:0021591 GO:1901380 GO:2000369 GO:0000082 GO:0008360 GO:0051234 GO:1902578
17 3 3 3 8 2 2 2 2 7 5 55 40
GO:0030001 GO:0030049 GO:0033275 GO:0044843 GO:0008064 GO:2000134 GO:0030832 GO:0048534 GO:1902807 GO:0051640 GO:1901991 GO:0065008 GO:0045471
14 3 3 7 5 4 5 15 4 10 5 46 5
GO:0090280 GO:1901987 GO:0006855 GO:0042744 GO:0045922 GO:0044773 GO:0010959 GO:0098813 GO:0044711 GO:1904063 GO:0070925 GO:0001556 GO:0014046
3 8 2 2 2 4 8 6 21 3 10 2 2
GO:0014059 GO:0035584 GO:0050901 GO:0051656 GO:0044774 GO:0051928 GO:0006874 GO:0072583 GO:1902749 GO:0000226 GO:0055007 GO:0070507 GO:0090279
2 2 2 9 4 4 9 3 3 8 4 4 4
GO:1901988 GO:0043462 GO:0034508 GO:0060045 GO:0090207 GO:0045930 GO:0002520 GO:0032271 GO:0022411 GO:0050891 GO:0003064 GO:0010726 GO:0015838
5 3 2 2 2 6 15 5 10 3 1 1 1
GO:0015879 GO:0030185 GO:0030221 GO:0032364 GO:0035585 GO:0044752 GO:0045799 GO:0045914 GO:0045963 GO:0046878 GO:0046900 GO:0051351 GO:0051694
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051987 GO:0055011 GO:0055014 GO:0060623 GO:0070560 GO:0090324 GO:0097069 GO:0097089 GO:1901856 GO:1902302 GO:1902603 GO:1903891 GO:1904387
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1904753 GO:2000048 GO:2000295 GO:0002376 GO:0055074 GO:0007017 GO:0061756 GO:0006812 GO:0010564 GO:0006979 GO:0051179 GO:0008154 GO:0044699
1 1 1 34 9 10 2 15 11 9 64 5 115
GO:0006898 GO:0014823 GO:0031109 GO:0055013 GO:0072503 GO:0006884 GO:0007094 GO:0009648 GO:0015893 GO:0060343 GO:0071173 GO:0071549 GO:0070252
7 3 3 3 9 2 2 2 2 2 2 2 4
GO:0034766 GO:0007098 GO:0055006 GO:0006810 GO:0014706 GO:0055085 GO:0006536 GO:0048741 GO:2000404 GO:0030330 GO:0045833 GO:0007067 GO:0006811
3 3 3 52 8 18 2 2 2 4 3 8 20
GO:0032886 GO:0015711 GO:0022600 GO:0007080 GO:0014904 GO:0031111 GO:0035162 GO:0043267 GO:0045841 GO:0048536 GO:0055023 GO:0071174 GO:2000816
4 7 3 2 2 2 2 2 2 2 2 2 2
GO:0007163 GO:0044092 GO:0072507 GO:0051726 GO:0030104 GO:0034763 GO:0003014 GO:0042692 GO:0000160 GO:0005988 GO:0005989 GO:0006030 GO:0006032
5 17 9 16 3 3 4 8 1 1 1 1 1
GO:0010040 GO:0010041 GO:0010266 GO:0010572 GO:0010899 GO:0010915 GO:0010916 GO:0014038 GO:0015677 GO:0030950 GO:0030997 GO:0032227 GO:0032792
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0033239 GO:0035583 GO:0042048 GO:0046877 GO:0051005 GO:0051581 GO:0051611 GO:0051612 GO:0055059 GO:0060018 GO:0060282 GO:0061624 GO:0070295
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0070837 GO:0071288 GO:0071692 GO:0071694 GO:0090306 GO:0097068 GO:0097623 GO:1902896 GO:1902956 GO:1903232 GO:1903237 GO:1904386 GO:1904715
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1905446 GO:0000272 GO:0030261 GO:1902100 GO:0060537 GO:0030833 GO:0035051 GO:0051017 GO:0061572 GO:0021700 GO:0007015 GO:0019433 GO:0031577
1 2 2 2 8 4 4 4 4 6 7 2 2
GO:0033048 GO:0007059 GO:0019217 GO:0051279 GO:0044763 GO:0097305 GO:0006970 GO:0098655 GO:0006730 GO:0010799 GO:0042398 GO:0071705 GO:0031329
2 6 3 3 107 5 3 11 2 2 2 12 8
GO:0048259 GO:0042770 GO:0022604 GO:0022402 GO:0033046 GO:0046461 GO:0046464 GO:0046605 GO:0048599 GO:0060043 GO:0060421 GO:0071548 GO:0007050
3 4 9 19 2 2 2 2 2 2 2 2 6
GO:0051128 GO:0001951 GO:0001996 GO:0005981 GO:0007144 GO:0010044 GO:0010985 GO:0015886 GO:0031049 GO:0031052 GO:0032532 GO:0034112 GO:0034382
30 1 1 1 1 1 1 1 1 1 1 1 1
GO:0035581 GO:0035646 GO:0043485 GO:0043987 GO:0046602 GO:0048757 GO:0051340 GO:0051610 GO:0051657 GO:0060374 GO:0071830 GO:0072318 GO:0072319
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0072719 GO:0090209 GO:0098915 GO:1900193 GO:1901072 GO:1904995 GO:2001046 GO:0055067 GO:2000045 GO:0008344 GO:0006631 GO:0060048 GO:0009994
1 1 1 1 1 1 1 4 4 3 7 4 2
GO:0015872 GO:0042304 GO:0051985 GO:0090382 GO:0043255 GO:0043623 GO:0030041 GO:1902806 GO:0018107 GO:0001960 GO:0006182 GO:0015682 GO:0033572
2 2 2 2 3 9 4 4 3 2 2 2 2
GO:0050433 GO:0072512 GO:2000401 GO:0048738 GO:0000077 GO:0030048 GO:0042417 GO:0051281 GO:0055025 GO:0046486 GO:0015850 GO:0030100 GO:0001672
2 2 2 5 4 4 2 2 2 7 5 5 1
GO:0001842 GO:0006001 GO:0006787 GO:0010891 GO:0010898 GO:0021670 GO:0022614 GO:0031442 GO:0033015 GO:0033483 GO:0033603 GO:0034447 GO:0034501
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0035090 GO:0035733 GO:0040016 GO:0042045 GO:0042167 GO:0042760 GO:0043476 GO:0043482 GO:0044539 GO:0045199 GO:0045541 GO:0045842 GO:0046149
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0046348 GO:0046351 GO:0050703 GO:0051315 GO:0071472 GO:0090206 GO:0090315 GO:1901970 GO:1902101 GO:1903431 GO:2000210 GO:0000041 GO:0018210
1 1 1 1 1 1 1 1 1 1 1 3 3
GO:0031100 GO:0032781 GO:0045839 GO:0051924 GO:0031145 GO:0043154 GO:0060191 GO:1904062 GO:2000021 GO:0006820 GO:0046503 GO:0050432 GO:0070542
3 2 2 5 3 3 3 5 5 8 2 2 2
GO:0072678 GO:0051297 GO:0048469 GO:0032956 GO:0031331 GO:0010948 GO:0031570 GO:0007589 GO:0042743 GO:0060761 GO:0043270 GO:0006109 GO:0010522
2 3 4 6 6 5 4 3 2 2 5 4 3
GO:1904064 GO:0045861 GO:0030029 GO:0001561 GO:0003321 GO:0006983 GO:0009635 GO:0010269 GO:0010457 GO:0010873 GO:0010889 GO:0015697 GO:0019740
3 6 11 1 1 1 1 1 1 1 1 1 1
GO:0021527 GO:0022417 GO:0030952 GO:0032372 GO:0032375 GO:0034379 GO:0034638 GO:0035434 GO:0045721 GO:0050872 GO:0051006 GO:0060586 GO:0070307
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0070345 GO:0071435 GO:0071474 GO:0072718 GO:0090166 GO:0090527 GO:0098722 GO:1901678 GO:1903236 GO:1903421 GO:1904714 GO:2000675 GO:0009108
1 1 1 1 1 1 1 1 1 1 1 1 3
GO:0014902 GO:0007077 GO:0031110 GO:0045840 GO:0055021 GO:1902930 GO:0008643 GO:0051346 GO:0015672 GO:0043254 GO:0051129 GO:0007062 GO:0030282
3 2 2 2 2 2 4 7 8 7 10 3 3
GO:2000117 GO:0006936 GO:0009894 GO:0035384 GO:0060038 GO:0071616 GO:0034220 GO:0031023 GO:0044255 GO:0072330 GO:0050790 GO:0042102 GO:0010466
3 7 9 2 2 2 13 3 14 4 30 3 5
GO:0048662 GO:0009314 GO:0002082 GO:0002430 GO:0002692 GO:0006534 GO:0006991 GO:0007635 GO:0009396 GO:0010872 GO:0010984 GO:0032025 GO:0032610
2 8 1 1 1 1 1 1 1 1 1 1 1
GO:0032933 GO:0036445 GO:0043471 GO:0046415 GO:0046541 GO:0048842 GO:0051409 GO:0051584 GO:0051639 GO:0051940 GO:0055057 GO:0060281 GO:0060315
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0060456 GO:0061365 GO:0070341 GO:0070344 GO:0071372 GO:0071459 GO:0071501 GO:0072537 GO:1900102 GO:1902669 GO:2000047 GO:0051235 GO:0006984
1 1 1 1 1 1 1 1 1 1 1 6 2
GO:0010043 GO:0030397 GO:0045123 GO:0046622 GO:0051081 GO:0048639 GO:0070509 GO:1903050 GO:0006584 GO:0009712 GO:0030433 GO:0032465 GO:0046470
2 2 2 2 2 4 4 5 2 2 2 2 2
GO:0048477 GO:0048747 GO:0050994 GO:0051784 GO:0050770 GO:0000280 GO:0022029 GO:0061383 GO:0071277 GO:0006941 GO:0008016 GO:0034599 GO:0001696
2 2 2 2 4 8 2 2 2 4 5 5 1
GO:0001956 GO:0001993 GO:0005984 GO:0006069 GO:0007168 GO:0010896 GO:0032780 GO:0033129 GO:0033160 GO:0034497 GO:0048671 GO:0060215 GO:0071281
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0090266 GO:0097286 GO:1903504 GO:1904706 GO:1904738 GO:1904752 GO:0044765 GO:0045786 GO:0006826 GO:0010524 GO:0032410 GO:0046686 GO:0051785
1 1 1 1 1 1 33 8 2 2 2 2 2
GO:0060420 GO:0034767 GO:1903052 GO:0030239 GO:0071385 GO:1901992 GO:0098660 GO:0032844 GO:0098662 GO:0045927 GO:0030036 GO:0032970 GO:0006897
2 3 4 2 2 2 10 8 9 5 9 6 10
GO:0007009 GO:0014855 GO:0021885 GO:0046068 GO:0048146 GO:1901989 GO:1904427 GO:2001251 GO:0006600 GO:0006750 GO:0010310 GO:0010763 GO:0010801
5 2 2 2 2 2 2 2 1 1 1 1 1
GO:0010866 GO:0016180 GO:0019184 GO:0019852 GO:0021516 GO:0030011 GO:0030730 GO:0031115 GO:0033700 GO:0034433 GO:0034434 GO:0034435 GO:0035067
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0035428 GO:0036500 GO:0042541 GO:0045780 GO:0046852 GO:0051280 GO:0051764 GO:0060346 GO:0061684 GO:0070493 GO:0086013 GO:0090231 GO:1900115
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1900116 GO:1902931 GO:1904659 GO:1904994 GO:2000674 GO:2001044 GO:0048513 GO:1903362 GO:0043086 GO:0001666 GO:0016202 GO:0034764 GO:0051209
1 1 1 1 1 1 38 5 12 6 3 3 3
GO:0051282 GO:0051283 GO:0051937 GO:0071320 GO:0001935 GO:0043271 GO:0051302 GO:1901861 GO:0048634 GO:0051208 GO:0097553 GO:1902656 GO:0051952
3 3 2 2 3 3 3 3 3 3 3 3 2
GO:0071384 GO:1901379 GO:2001252 GO:0036293 GO:0006067 GO:0006878 GO:0009437 GO:0010642 GO:0016540 GO:0021781 GO:0030157 GO:0030502 GO:0032530
2 2 2 6 1 1 1 1 1 1 1 1 1
GO:0032769 GO:0033605 GO:0035815 GO:0036152 GO:0042416 GO:0043206 GO:0044331 GO:0045978 GO:0046655 GO:0050812 GO:0051284 GO:0051583 GO:0051589
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051934 GO:0060192 GO:0060347 GO:0060732 GO:0090493 GO:0090494 GO:0097050 GO:1900025 GO:1903579 GO:1904355 GO:1903169 GO:0040008 GO:0008015
1 1 1 1 1 1 1 1 1 1 3 10 9
GO:1903364 GO:0098657 GO:0072331 GO:0048285 GO:0008610 GO:0003013 GO:0043161 GO:0055024 GO:0070527 GO:0007626 GO:0033002 GO:0065009 GO:0009896
4 2 5 8 9 9 6 2 2 4 4 34 6
GO:0045859 GO:0006633 GO:0010675 GO:0051592 GO:0007049 GO:0016192 GO:0006825 GO:0009312 GO:0032354 GO:0034219 GO:0035728 GO:0035729 GO:0035810
11 3 3 3 21 18 1 1 1 1 1 1 1
GO:0042559 GO:0043457 GO:0044804 GO:0050930 GO:0051895 GO:0051953 GO:0055070 GO:0070168 GO:0070633 GO:0071285 GO:0071872 GO:0090026 GO:0090330
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:2000757 GO:0006026 GO:0016051 GO:0031214 GO:0046777 GO:0046718 GO:0007588 GO:0036503 GO:0031099 GO:0030260 GO:0044409 GO:0045766 GO:0051806
1 2 4 3 5 3 2 2 4 3 3 3 3
GO:0051828 GO:0072659 GO:0006968 GO:0015837 GO:0045844 GO:0048199 GO:0048636 GO:0051225 GO:1901863 GO:0010769 GO:0019216 GO:0001774 GO:0001921
3 4 2 2 2 2 2 2 2 5 5 1 1
GO:0006577 GO:0006895 GO:0007020 GO:0008356 GO:0010523 GO:0010919 GO:0014048 GO:0015874 GO:0019682 GO:0033197 GO:0035855 GO:0036151 GO:0042572
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0048103 GO:0048148 GO:0050685 GO:0050849 GO:0051016 GO:0051187 GO:0051590 GO:0061430 GO:0071539 GO:0071801 GO:0086011 GO:1901071 GO:1901984
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1903392 GO:2000036 GO:0006637 GO:0009064 GO:0030500 GO:0035383 GO:0046890 GO:0050671 GO:0051238 GO:0060047 GO:0070482 GO:0018105 GO:0032946
1 1 2 2 2 2 3 3 3 5 6 5 3
GO:0055017 GO:0007517 GO:0043269 GO:0061061 GO:0003015 GO:0006000 GO:0007026 GO:0007095 GO:0007250 GO:0010971 GO:0023019 GO:0030214 GO:0030220
2 6 8 9 5 1 1 1 1 1 1 1 1
GO:0032225 GO:0033127 GO:0033158 GO:0034375 GO:0036344 GO:0045540 GO:0045830 GO:0048308 GO:0048313 GO:0050995 GO:0051580 GO:0060307 GO:0070293
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0071371 GO:0071732 GO:0071871 GO:0090208 GO:0097067 GO:0098659 GO:0099587 GO:0099625 GO:1902236 GO:0001938 GO:0036498 GO:1901655
1 1 1 1 1 1 1 1 1 2 2 2
[ reached getOption("max.print") -- omitted 1978 entries ]
universeCounts(hgOver)
GO:0046501 GO:0006779 GO:0006778 GO:0033014 GO:0042168 GO:0006782 GO:0006783 GO:0033013 GO:0051188 GO:1990267 GO:0042440 GO:0019755 GO:0046148
9 15 24 16 19 7 13 36 88 99 44 9 37
GO:0055080 GO:0050801 GO:0098771 GO:0055065 GO:0010038 GO:0051597 GO:0044772 GO:0055082 GO:0048821 GO:0030218 GO:0051261 GO:0032536 GO:0000086
417 456 426 369 235 7 312 471 19 76 54 8 129
GO:0044770 GO:0051494 GO:1901880 GO:0035378 GO:0070541 GO:0046688 GO:0017085 GO:0044839 GO:0034101 GO:0043242 GO:0051693 GO:0006873 GO:0032272
328 78 36 2 2 21 9 137 83 39 22 381 40
GO:0030835 GO:0055015 GO:0055072 GO:0048878 GO:0010039 GO:0055002 GO:1901879 GO:0055076 GO:0015701 GO:0051186 GO:0010961 GO:0032535 GO:0030003
23 10 42 685 25 91 44 67 26 217 3 218 373
GO:0019725 GO:0006638 GO:0010288 GO:0002262 GO:0030834 GO:0071248 GO:0051983 GO:0010639 GO:0055001 GO:0030071 GO:0006833 GO:0055012 GO:0007091
540 72 13 100 29 101 50 163 102 30 14 14 31
GO:0010965 GO:0030837 GO:1902099 GO:0007079 GO:0015670 GO:0031133 GO:0071918 GO:0043244 GO:0030042 GO:0044784 GO:0051306 GO:0042592 GO:0090066
31 31 31 4 4 4 4 53 32 32 32 1032 317
GO:0043624 GO:0008608 GO:0007052 GO:0051303 GO:0050000 GO:0006875 GO:0000278 GO:0015840 GO:0042493 GO:0042044 GO:0033047 GO:0007088 GO:0071241
80 16 34 34 35 327 626 5 330 18 37 88 120
GO:0045931 GO:0006641 GO:0071158 GO:0000212 GO:0003062 GO:0003097 GO:0042908 GO:0051305 GO:0051988 GO:0071280 GO:0072488 GO:0061515 GO:0046685
90 63 64 6 6 6 6 6 6 6 6 40 20
GO:0051304 GO:0033045 GO:0090068 GO:1903047 GO:0007100 GO:0043249 GO:0051299 GO:0051146 GO:0007010 GO:0006639 GO:0031333 GO:0048872 GO:0051301
41 42 162 572 7 7 7 166 675 71 71 169 324
GO:0010035 GO:0007346 GO:0006977 GO:0072413 GO:0072431 GO:1902400 GO:1902402 GO:1902403 GO:0010960 GO:0015793 GO:0051382 GO:0007093 GO:0051783
366 326 47 47 47 47 47 47 8 8 8 105 106
GO:0072401 GO:0072422 GO:0071156 GO:0051310 GO:0009636 GO:0051258 GO:0072395 GO:0000075 GO:0009992 GO:0051004 GO:0000070 GO:0007586 GO:0031571
49 49 77 26 143 143 50 144 9 9 79 79 51
GO:0044783 GO:0044819 GO:0045787 GO:0032846 GO:0046916 GO:0051493 GO:0015791 GO:0035404 GO:0046689 GO:0051383 GO:0033044 GO:0050853 GO:0000819
51 51 223 149 53 265 10 10 10 10 83 29 118
GO:0008361 GO:0043241 GO:0003091 GO:0006879 GO:1902589 GO:0015669 GO:0015695 GO:0030097 GO:0016572 GO:0030099 GO:0032984 GO:0015696 GO:0045717
118 118 30 30 925 11 11 494 31 237 124 12 12
GO:0051642 GO:0051984 GO:0000741 GO:0003065 GO:0006797 GO:0007056 GO:0007057 GO:0007344 GO:0010868 GO:0010900 GO:0015881 GO:0021503 GO:0033326
12 12 1 1 1 1 1 1 1 1 1 1 1
GO:0035227 GO:0035229 GO:0035377 GO:0045196 GO:0045200 GO:0046680 GO:0046901 GO:0048203 GO:0051585 GO:0051620 GO:0051621 GO:0051622 GO:0051661
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051945 GO:0060217 GO:0060375 GO:0070494 GO:0070495 GO:0071283 GO:0071284 GO:0085018 GO:0099607 GO:0100024 GO:1900195 GO:1900402 GO:1901731
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1902303 GO:1902598 GO:1902861 GO:1902957 GO:1903126 GO:1903282 GO:1903284 GO:1903285 GO:1903892 GO:1905447 GO:2000468 GO:2000470 GO:2000775
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0033043 GO:0010389 GO:0046785 GO:0051055 GO:1901990 GO:0020027 GO:0021591 GO:1901380 GO:2000369 GO:0000082 GO:0008360 GO:0051234 GO:1902578
602 34 34 34 205 13 13 13 13 167 95 2682 1823
GO:0030001 GO:0030049 GO:0033275 GO:0044843 GO:0008064 GO:2000134 GO:0030832 GO:0048534 GO:1902807 GO:0051640 GO:1901991 GO:0065008 GO:0045471
472 36 36 170 97 65 98 524 66 297 99 2181 100
GO:0090280 GO:1901987 GO:0006855 GO:0042744 GO:0045922 GO:0044773 GO:0010959 GO:0098813 GO:0044711 GO:1904063 GO:0070925 GO:0001556 GO:0014046
38 216 15 15 15 68 217 139 831 40 307 16 16
GO:0014059 GO:0035584 GO:0050901 GO:0051656 GO:0044774 GO:0051928 GO:0006874 GO:0072583 GO:1902749 GO:0000226 GO:0055007 GO:0070507 GO:0090279
16 16 16 265 71 71 266 41 41 224 72 72 72
GO:1901988 GO:0043462 GO:0034508 GO:0060045 GO:0090207 GO:0045930 GO:0002520 GO:0032271 GO:0022411 GO:0050891 GO:0003064 GO:0010726 GO:0015838
107 42 17 17 17 146 550 109 316 43 2 2 2
GO:0015879 GO:0030185 GO:0030221 GO:0032364 GO:0035585 GO:0044752 GO:0045799 GO:0045914 GO:0045963 GO:0046878 GO:0046900 GO:0051351 GO:0051694
2 2 2 2 2 2 2 2 2 2 2 2 2
GO:0051987 GO:0055011 GO:0055014 GO:0060623 GO:0070560 GO:0090324 GO:0097069 GO:0097089 GO:1901856 GO:1902302 GO:1902603 GO:1903891 GO:1904387
2 2 2 2 2 2 2 2 2 2 2 2 2
GO:1904753 GO:2000048 GO:2000295 GO:0002376 GO:0055074 GO:0007017 GO:0061756 GO:0006812 GO:0010564 GO:0006979 GO:0051179 GO:0008154 GO:0044699
2 2 2 1553 274 319 18 556 366 276 3305 112 6648
GO:0006898 GO:0014823 GO:0031109 GO:0055013 GO:0072503 GO:0006884 GO:0007094 GO:0009648 GO:0015893 GO:0060343 GO:0071173 GO:0071549 GO:0070252
191 45 45 45 278 19 19 19 19 19 19 19 78
GO:0034766 GO:0007098 GO:0055006 GO:0006810 GO:0014706 GO:0055085 GO:0006536 GO:0048741 GO:2000404 GO:0030330 GO:0045833 GO:0007067 GO:0006811
46 47 47 2610 240 720 20 20 20 81 48 243 828
GO:0032886 GO:0015711 GO:0022600 GO:0007080 GO:0014904 GO:0031111 GO:0035162 GO:0043267 GO:0045841 GO:0048536 GO:0055023 GO:0071174 GO:2000816
82 201 49 21 21 21 21 21 21 21 21 21 21
GO:0007163 GO:0044092 GO:0072507 GO:0051726 GO:0030104 GO:0034763 GO:0003014 GO:0042692 GO:0000160 GO:0005988 GO:0005989 GO:0006030 GO:0006032
120 679 291 630 50 50 84 248 3 3 3 3 3
GO:0010040 GO:0010041 GO:0010266 GO:0010572 GO:0010899 GO:0010915 GO:0010916 GO:0014038 GO:0015677 GO:0030950 GO:0030997 GO:0032227 GO:0032792
3 3 3 3 3 3 3 3 3 3 3 3 3
GO:0033239 GO:0035583 GO:0042048 GO:0046877 GO:0051005 GO:0051581 GO:0051611 GO:0051612 GO:0055059 GO:0060018 GO:0060282 GO:0061624 GO:0070295
3 3 3 3 3 3 3 3 3 3 3 3 3
GO:0070837 GO:0071288 GO:0071692 GO:0071694 GO:0090306 GO:0097068 GO:0097623 GO:1902896 GO:1902956 GO:1903232 GO:1903237 GO:1904386 GO:1904715
3 3 3 3 3 3 3 3 3 3 3 3 3
GO:1905446 GO:0000272 GO:0030261 GO:1902100 GO:0060537 GO:0030833 GO:0035051 GO:0051017 GO:0061572 GO:0021700 GO:0007015 GO:0019433 GO:0031577
3 22 22 22 250 87 87 87 87 166 209 23 23
GO:0033048 GO:0007059 GO:0019217 GO:0051279 GO:0044763 GO:0097305 GO:0006970 GO:0098655 GO:0006730 GO:0010799 GO:0042398 GO:0071705 GO:0031329
23 167 53 53 6139 128 54 398 24 24 24 447 259
GO:0048259 GO:0042770 GO:0022604 GO:0022402 GO:0033046 GO:0046461 GO:0046464 GO:0046605 GO:0048599 GO:0060043 GO:0060421 GO:0071548 GO:0007050
55 91 307 809 25 25 25 25 25 25 25 25 173
GO:0051128 GO:0001951 GO:0001996 GO:0005981 GO:0007144 GO:0010044 GO:0010985 GO:0015886 GO:0031049 GO:0031052 GO:0032532 GO:0034112 GO:0034382
1408 4 4 4 4 4 4 4 4 4 4 4 4
GO:0035581 GO:0035646 GO:0043485 GO:0043987 GO:0046602 GO:0048757 GO:0051340 GO:0051610 GO:0051657 GO:0060374 GO:0071830 GO:0072318 GO:0072319
4 4 4 4 4 4 4 4 4 4 4 4 4
GO:0072719 GO:0090209 GO:0098915 GO:1900193 GO:1901072 GO:1904995 GO:2001046 GO:0055067 GO:2000045 GO:0008344 GO:0006631 GO:0060048 GO:0009994
4 4 4 4 4 4 4 93 93 57 220 94 26
GO:0015872 GO:0042304 GO:0051985 GO:0090382 GO:0043255 GO:0043623 GO:0030041 GO:1902806 GO:0018107 GO:0001960 GO:0006182 GO:0015682 GO:0033572
26 26 26 26 58 314 96 96 59 27 27 27 27
GO:0050433 GO:0072512 GO:2000401 GO:0048738 GO:0000077 GO:0030048 GO:0042417 GO:0051281 GO:0055025 GO:0046486 GO:0015850 GO:0030100 GO:0001672
27 27 27 137 97 97 28 28 28 229 141 141 5
GO:0001842 GO:0006001 GO:0006787 GO:0010891 GO:0010898 GO:0021670 GO:0022614 GO:0031442 GO:0033015 GO:0033483 GO:0033603 GO:0034447 GO:0034501
5 5 5 5 5 5 5 5 5 5 5 5 5
GO:0035090 GO:0035733 GO:0040016 GO:0042045 GO:0042167 GO:0042760 GO:0043476 GO:0043482 GO:0044539 GO:0045199 GO:0045541 GO:0045842 GO:0046149
5 5 5 5 5 5 5 5 5 5 5 5 5
GO:0046348 GO:0046351 GO:0050703 GO:0051315 GO:0071472 GO:0090206 GO:0090315 GO:1901970 GO:1902101 GO:1903431 GO:2000210 GO:0000041 GO:0018210
5 5 5 5 5 5 5 5 5 5 5 62 62
GO:0031100 GO:0032781 GO:0045839 GO:0051924 GO:0031145 GO:0043154 GO:0060191 GO:1904062 GO:2000021 GO:0006820 GO:0046503 GO:0050432 GO:0070542
62 29 29 142 63 63 63 143 143 279 30 30 30
GO:0072678 GO:0051297 GO:0048469 GO:0032956 GO:0031331 GO:0010948 GO:0031570 GO:0007589 GO:0042743 GO:0060761 GO:0043270 GO:0006109 GO:0010522
30 64 103 189 190 146 104 65 31 31 147 105 66
GO:1904064 GO:0045861 GO:0030029 GO:0001561 GO:0003321 GO:0006983 GO:0009635 GO:0010269 GO:0010457 GO:0010873 GO:0010889 GO:0015697 GO:0019740
66 193 435 6 6 6 6 6 6 6 6 6 6
GO:0021527 GO:0022417 GO:0030952 GO:0032372 GO:0032375 GO:0034379 GO:0034638 GO:0035434 GO:0045721 GO:0050872 GO:0051006 GO:0060586 GO:0070307
6 6 6 6 6 6 6 6 6 6 6 6 6
GO:0070345 GO:0071435 GO:0071474 GO:0072718 GO:0090166 GO:0090527 GO:0098722 GO:1901678 GO:1903236 GO:1903421 GO:1904714 GO:2000675 GO:0009108
6 6 6 6 6 6 6 6 6 6 6 6 67
GO:0014902 GO:0007077 GO:0031110 GO:0045840 GO:0055021 GO:1902930 GO:0008643 GO:0051346 GO:0015672 GO:0043254 GO:0051129 GO:0007062 GO:0030282
67 32 32 32 32 32 107 241 289 242 388 68 68
GO:2000117 GO:0006936 GO:0009894 GO:0035384 GO:0060038 GO:0071616 GO:0034220 GO:0031023 GO:0044255 GO:0072330 GO:0050790 GO:0042102 GO:0010466
68 243 340 33 33 33 543 69 598 110 1480 70 154
GO:0048662 GO:0009314 GO:0002082 GO:0002430 GO:0002692 GO:0006534 GO:0006991 GO:0007635 GO:0009396 GO:0010872 GO:0010984 GO:0032025 GO:0032610
34 296 7 7 7 7 7 7 7 7 7 7 7
GO:0032933 GO:0036445 GO:0043471 GO:0046415 GO:0046541 GO:0048842 GO:0051409 GO:0051584 GO:0051639 GO:0051940 GO:0055057 GO:0060281 GO:0060315
7 7 7 7 7 7 7 7 7 7 7 7 7
GO:0060456 GO:0061365 GO:0070341 GO:0070344 GO:0071372 GO:0071459 GO:0071501 GO:0072537 GO:1900102 GO:1902669 GO:2000047 GO:0051235 GO:0006984
7 7 7 7 7 7 7 7 7 7 7 202 35
GO:0010043 GO:0030397 GO:0045123 GO:0046622 GO:0051081 GO:0048639 GO:0070509 GO:1903050 GO:0006584 GO:0009712 GO:0030433 GO:0032465 GO:0046470
35 35 35 35 35 113 113 157 36 36 36 36 36
GO:0048477 GO:0048747 GO:0050994 GO:0051784 GO:0050770 GO:0000280 GO:0022029 GO:0061383 GO:0071277 GO:0006941 GO:0008016 GO:0034599 GO:0001696
36 36 36 36 115 302 37 37 37 117 162 162 8
GO:0001956 GO:0001993 GO:0005984 GO:0006069 GO:0007168 GO:0010896 GO:0032780 GO:0033129 GO:0033160 GO:0034497 GO:0048671 GO:0060215 GO:0071281
8 8 8 8 8 8 8 8 8 8 8 8 8
GO:0090266 GO:0097286 GO:1903504 GO:1904706 GO:1904738 GO:1904752 GO:0044765 GO:0045786 GO:0006826 GO:0010524 GO:0032410 GO:0046686 GO:0051785
8 8 8 8 8 8 1679 307 38 38 38 38 38
GO:0060420 GO:0034767 GO:1903052 GO:0030239 GO:0071385 GO:1901992 GO:0098660 GO:0032844 GO:0098662 GO:0045927 GO:0030036 GO:0032970 GO:0006897
38 77 120 39 39 39 413 312 363 167 364 215 416
GO:0007009 GO:0014855 GO:0021885 GO:0046068 GO:0048146 GO:1901989 GO:1904427 GO:2001251 GO:0006600 GO:0006750 GO:0010310 GO:0010763 GO:0010801
168 40 40 40 40 40 40 40 9 9 9 9 9
GO:0010866 GO:0016180 GO:0019184 GO:0019852 GO:0021516 GO:0030011 GO:0030730 GO:0031115 GO:0033700 GO:0034433 GO:0034434 GO:0034435 GO:0035067
9 9 9 9 9 9 9 9 9 9 9 9 9
GO:0035428 GO:0036500 GO:0042541 GO:0045780 GO:0046852 GO:0051280 GO:0051764 GO:0060346 GO:0061684 GO:0070493 GO:0086013 GO:0090231 GO:1900115
9 9 9 9 9 9 9 9 9 9 9 9 9
GO:1900116 GO:1902931 GO:1904659 GO:1904994 GO:2000674 GO:2001044 GO:0048513 GO:1903362 GO:0043086 GO:0001666 GO:0016202 GO:0034764 GO:0051209
9 9 9 9 9 9 1990 169 523 217 80 80 80
GO:0051282 GO:0051283 GO:0051937 GO:0071320 GO:0001935 GO:0043271 GO:0051302 GO:1901861 GO:0048634 GO:0051208 GO:0097553 GO:1902656 GO:0051952
80 80 41 41 81 81 81 81 82 82 82 82 42
GO:0071384 GO:1901379 GO:2001252 GO:0036293 GO:0006067 GO:0006878 GO:0009437 GO:0010642 GO:0016540 GO:0021781 GO:0030157 GO:0030502 GO:0032530
42 42 42 222 10 10 10 10 10 10 10 10 10
GO:0032769 GO:0033605 GO:0035815 GO:0036152 GO:0042416 GO:0043206 GO:0044331 GO:0045978 GO:0046655 GO:0050812 GO:0051284 GO:0051583 GO:0051589
10 10 10 10 10 10 10 10 10 10 10 10 10
GO:0051934 GO:0060192 GO:0060347 GO:0060732 GO:0090493 GO:0090494 GO:0097050 GO:1900025 GO:1903579 GO:1904355 GO:1903169 GO:0040008 GO:0008015
10 10 10 10 10 10 10 10 10 10 83 426 374
GO:1903364 GO:0098657 GO:0072331 GO:0048285 GO:0008610 GO:0003013 GO:0043161 GO:0055024 GO:0070527 GO:0007626 GO:0033002 GO:0065009 GO:0009896
128 43 175 324 376 377 225 44 44 130 130 1783 226
GO:0045859 GO:0006633 GO:0010675 GO:0051592 GO:0007049 GO:0016192 GO:0006825 GO:0009312 GO:0032354 GO:0034219 GO:0035728 GO:0035729 GO:0035810
486 86 86 86 1038 870 11 11 11 11 11 11 11
GO:0042559 GO:0043457 GO:0044804 GO:0050930 GO:0051895 GO:0051953 GO:0055070 GO:0070168 GO:0070633 GO:0071285 GO:0071872 GO:0090026 GO:0090330
11 11 11 11 11 11 11 11 11 11 11 11 11
GO:2000757 GO:0006026 GO:0016051 GO:0031214 GO:0046777 GO:0046718 GO:0007588 GO:0036503 GO:0031099 GO:0030260 GO:0044409 GO:0045766 GO:0051806
11 45 132 87 180 88 46 46 134 89 89 89 89
GO:0051828 GO:0072659 GO:0006968 GO:0015837 GO:0045844 GO:0048199 GO:0048636 GO:0051225 GO:1901863 GO:0010769 GO:0019216 GO:0001774 GO:0001921
89 135 47 47 47 47 47 47 47 184 184 12 12
GO:0006577 GO:0006895 GO:0007020 GO:0008356 GO:0010523 GO:0010919 GO:0014048 GO:0015874 GO:0019682 GO:0033197 GO:0035855 GO:0036151 GO:0042572
12 12 12 12 12 12 12 12 12 12 12 12 12
GO:0048103 GO:0048148 GO:0050685 GO:0050849 GO:0051016 GO:0051187 GO:0051590 GO:0061430 GO:0071539 GO:0071801 GO:0086011 GO:1901071 GO:1901984
12 12 12 12 12 12 12 12 12 12 12 12 12
GO:1903392 GO:2000036 GO:0006637 GO:0009064 GO:0030500 GO:0035383 GO:0046890 GO:0050671 GO:0051238 GO:0060047 GO:0070482 GO:0018105 GO:0032946
12 12 48 48 48 48 91 91 91 186 236 187 92
GO:0055017 GO:0007517 GO:0043269 GO:0061061 GO:0003015 GO:0006000 GO:0007026 GO:0007095 GO:0007250 GO:0010971 GO:0023019 GO:0030214 GO:0030220
49 239 343 396 189 13 13 13 13 13 13 13 13
GO:0032225 GO:0033127 GO:0033158 GO:0034375 GO:0036344 GO:0045540 GO:0045830 GO:0048308 GO:0048313 GO:0050995 GO:0051580 GO:0060307 GO:0070293
13 13 13 13 13 13 13 13 13 13 13 13 13
GO:0071371 GO:0071732 GO:0071871 GO:0090208 GO:0097067 GO:0098659 GO:0099587 GO:0099625 GO:1902236 GO:0001938 GO:0036498 GO:1901655
13 13 13 13 13 13 13 13 13 50 50 50
[ reached getOption("max.print") -- omitted 1978 entries ]
# Also, plot the GO directed acyclic graph (DAG) as follows:
plot(goDag(hgOver))
# Finally, generate the report as an HTML file that can be read using any browser, as follows:
htmlReport(hgOver, file="ALL_hgco.html")
library(GOstats)
hgCutoff <- 0.05
params <- new("GOHyperGParams", geneIds=sel_genes, universeGeneIds= all_genes, annotation="hgu95av2.db", ontology="BP", pvalueCutoff=hgCutoff, conditional=FALSE, testDirection="over")
converting geneIds from list to atomic vector via unlistconverting univ from list to atomic vector via unlist
hgOver <- hyperGTest(params)
summary(hgOver)
geneCounts(hgOver)
GO:0046501 GO:0006779 GO:0006778 GO:0033014 GO:0042168 GO:0006782 GO:0006783 GO:0033013 GO:0051188 GO:1990267 GO:0042440 GO:0019755 GO:0046148
6 7 8 7 7 5 6 8 10 10 7 4 6
GO:0055080 GO:0050801 GO:0098771 GO:0055065 GO:0010038 GO:0051597 GO:0044772 GO:0055082 GO:0048821 GO:0030218 GO:0051261 GO:0032536 GO:0000086
19 20 19 17 13 3 15 19 4 7 6 3 9
GO:0044770 GO:0051494 GO:1901880 GO:0035378 GO:0070541 GO:0046688 GO:0017085 GO:0044839 GO:0034101 GO:0043242 GO:0051693 GO:0006873 GO:0032272
15 7 5 2 2 4 3 9 7 5 4 16 5
GO:0030835 GO:0055015 GO:0055072 GO:0048878 GO:0010039 GO:0055002 GO:1901879 GO:0055076 GO:0015701 GO:0051186 GO:0010961 GO:0032535 GO:0030003
4 3 5 23 4 7 5 6 4 11 2 11 15
GO:0019725 GO:0006638 GO:0010288 GO:0002262 GO:0030834 GO:0071248 GO:0051983 GO:0010639 GO:0055001 GO:0030071 GO:0006833 GO:0055012 GO:0007091
19 6 3 7 4 7 5 9 7 4 3 3 4
GO:0010965 GO:0030837 GO:1902099 GO:0007079 GO:0015670 GO:0031133 GO:0071918 GO:0043244 GO:0030042 GO:0044784 GO:0051306 GO:0042592 GO:0090066
4 4 4 2 2 2 2 5 4 4 4 29 13
GO:0043624 GO:0008608 GO:0007052 GO:0051303 GO:0050000 GO:0006875 GO:0000278 GO:0015840 GO:0042493 GO:0042044 GO:0033047 GO:0007088 GO:0071241
6 3 4 4 4 13 20 2 13 3 4 6 7
GO:0045931 GO:0006641 GO:0071158 GO:0000212 GO:0003062 GO:0003097 GO:0042908 GO:0051305 GO:0051988 GO:0071280 GO:0072488 GO:0061515 GO:0046685
6 5 5 2 2 2 2 2 2 2 2 4 3
GO:0051304 GO:0033045 GO:0090068 GO:1903047 GO:0007100 GO:0043249 GO:0051299 GO:0051146 GO:0007010 GO:0006639 GO:0031333 GO:0048872 GO:0051301
4 4 8 18 2 2 2 8 20 5 5 8 12
GO:0010035 GO:0007346 GO:0006977 GO:0072413 GO:0072431 GO:1902400 GO:1902402 GO:1902403 GO:0010960 GO:0015793 GO:0051382 GO:0007093 GO:0051783
13 12 4 4 4 4 4 4 2 2 2 6 6
GO:0072401 GO:0072422 GO:0071156 GO:0051310 GO:0009636 GO:0051258 GO:0072395 GO:0000075 GO:0009992 GO:0051004 GO:0000070 GO:0007586 GO:0031571
4 4 5 3 7 7 4 7 2 2 5 5 4
GO:0044783 GO:0044819 GO:0045787 GO:0032846 GO:0046916 GO:0051493 GO:0015791 GO:0035404 GO:0046689 GO:0051383 GO:0033044 GO:0050853 GO:0000819
4 4 9 7 4 10 2 2 2 2 5 3 6
GO:0008361 GO:0043241 GO:0003091 GO:0006879 GO:1902589 GO:0015669 GO:0015695 GO:0030097 GO:0016572 GO:0030099 GO:0032984 GO:0015696 GO:0045717
6 6 3 3 24 2 2 15 3 9 6 2 2
GO:0051642 GO:0051984 GO:0000741 GO:0003065 GO:0006797 GO:0007056 GO:0007057 GO:0007344 GO:0010868 GO:0010900 GO:0015881 GO:0021503 GO:0033326
2 2 1 1 1 1 1 1 1 1 1 1 1
GO:0035227 GO:0035229 GO:0035377 GO:0045196 GO:0045200 GO:0046680 GO:0046901 GO:0048203 GO:0051585 GO:0051620 GO:0051621 GO:0051622 GO:0051661
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051945 GO:0060217 GO:0060375 GO:0070494 GO:0070495 GO:0071283 GO:0071284 GO:0085018 GO:0099607 GO:0100024 GO:1900195 GO:1900402 GO:1901731
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1902303 GO:1902598 GO:1902861 GO:1902957 GO:1903126 GO:1903282 GO:1903284 GO:1903285 GO:1903892 GO:1905447 GO:2000468 GO:2000470 GO:2000775
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0033043 GO:0010389 GO:0046785 GO:0051055 GO:1901990 GO:0020027 GO:0021591 GO:1901380 GO:2000369 GO:0000082 GO:0008360 GO:0051234 GO:1902578
17 3 3 3 8 2 2 2 2 7 5 55 40
GO:0030001 GO:0030049 GO:0033275 GO:0044843 GO:0008064 GO:2000134 GO:0030832 GO:0048534 GO:1902807 GO:0051640 GO:1901991 GO:0065008 GO:0045471
14 3 3 7 5 4 5 15 4 10 5 46 5
GO:0090280 GO:1901987 GO:0006855 GO:0042744 GO:0045922 GO:0044773 GO:0010959 GO:0098813 GO:0044711 GO:1904063 GO:0070925 GO:0001556 GO:0014046
3 8 2 2 2 4 8 6 21 3 10 2 2
GO:0014059 GO:0035584 GO:0050901 GO:0051656 GO:0044774 GO:0051928 GO:0006874 GO:0072583 GO:1902749 GO:0000226 GO:0055007 GO:0070507 GO:0090279
2 2 2 9 4 4 9 3 3 8 4 4 4
GO:1901988 GO:0043462 GO:0034508 GO:0060045 GO:0090207 GO:0045930 GO:0002520 GO:0032271 GO:0022411 GO:0050891 GO:0003064 GO:0010726 GO:0015838
5 3 2 2 2 6 15 5 10 3 1 1 1
GO:0015879 GO:0030185 GO:0030221 GO:0032364 GO:0035585 GO:0044752 GO:0045799 GO:0045914 GO:0045963 GO:0046878 GO:0046900 GO:0051351 GO:0051694
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051987 GO:0055011 GO:0055014 GO:0060623 GO:0070560 GO:0090324 GO:0097069 GO:0097089 GO:1901856 GO:1902302 GO:1902603 GO:1903891 GO:1904387
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1904753 GO:2000048 GO:2000295 GO:0002376 GO:0055074 GO:0007017 GO:0061756 GO:0006812 GO:0010564 GO:0006979 GO:0051179 GO:0008154 GO:0044699
1 1 1 34 9 10 2 15 11 9 64 5 115
GO:0006898 GO:0014823 GO:0031109 GO:0055013 GO:0072503 GO:0006884 GO:0007094 GO:0009648 GO:0015893 GO:0060343 GO:0071173 GO:0071549 GO:0070252
7 3 3 3 9 2 2 2 2 2 2 2 4
GO:0034766 GO:0007098 GO:0055006 GO:0006810 GO:0014706 GO:0055085 GO:0006536 GO:0048741 GO:2000404 GO:0030330 GO:0045833 GO:0007067 GO:0006811
3 3 3 52 8 18 2 2 2 4 3 8 20
GO:0032886 GO:0015711 GO:0022600 GO:0007080 GO:0014904 GO:0031111 GO:0035162 GO:0043267 GO:0045841 GO:0048536 GO:0055023 GO:0071174 GO:2000816
4 7 3 2 2 2 2 2 2 2 2 2 2
GO:0007163 GO:0044092 GO:0072507 GO:0051726 GO:0030104 GO:0034763 GO:0003014 GO:0042692 GO:0000160 GO:0005988 GO:0005989 GO:0006030 GO:0006032
5 17 9 16 3 3 4 8 1 1 1 1 1
GO:0010040 GO:0010041 GO:0010266 GO:0010572 GO:0010899 GO:0010915 GO:0010916 GO:0014038 GO:0015677 GO:0030950 GO:0030997 GO:0032227 GO:0032792
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0033239 GO:0035583 GO:0042048 GO:0046877 GO:0051005 GO:0051581 GO:0051611 GO:0051612 GO:0055059 GO:0060018 GO:0060282 GO:0061624 GO:0070295
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0070837 GO:0071288 GO:0071692 GO:0071694 GO:0090306 GO:0097068 GO:0097623 GO:1902896 GO:1902956 GO:1903232 GO:1903237 GO:1904386 GO:1904715
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1905446 GO:0000272 GO:0030261 GO:1902100 GO:0060537 GO:0030833 GO:0035051 GO:0051017 GO:0061572 GO:0021700 GO:0007015 GO:0019433 GO:0031577
1 2 2 2 8 4 4 4 4 6 7 2 2
GO:0033048 GO:0007059 GO:0019217 GO:0051279 GO:0044763 GO:0097305 GO:0006970 GO:0098655 GO:0006730 GO:0010799 GO:0042398 GO:0071705 GO:0031329
2 6 3 3 107 5 3 11 2 2 2 12 8
GO:0048259 GO:0042770 GO:0022604 GO:0022402 GO:0033046 GO:0046461 GO:0046464 GO:0046605 GO:0048599 GO:0060043 GO:0060421 GO:0071548 GO:0007050
3 4 9 19 2 2 2 2 2 2 2 2 6
GO:0051128 GO:0001951 GO:0001996 GO:0005981 GO:0007144 GO:0010044 GO:0010985 GO:0015886 GO:0031049 GO:0031052 GO:0032532 GO:0034112 GO:0034382
30 1 1 1 1 1 1 1 1 1 1 1 1
GO:0035581 GO:0035646 GO:0043485 GO:0043987 GO:0046602 GO:0048757 GO:0051340 GO:0051610 GO:0051657 GO:0060374 GO:0071830 GO:0072318 GO:0072319
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0072719 GO:0090209 GO:0098915 GO:1900193 GO:1901072 GO:1904995 GO:2001046 GO:0055067 GO:2000045 GO:0008344 GO:0006631 GO:0060048 GO:0009994
1 1 1 1 1 1 1 4 4 3 7 4 2
GO:0015872 GO:0042304 GO:0051985 GO:0090382 GO:0043255 GO:0043623 GO:0030041 GO:1902806 GO:0018107 GO:0001960 GO:0006182 GO:0015682 GO:0033572
2 2 2 2 3 9 4 4 3 2 2 2 2
GO:0050433 GO:0072512 GO:2000401 GO:0048738 GO:0000077 GO:0030048 GO:0042417 GO:0051281 GO:0055025 GO:0046486 GO:0015850 GO:0030100 GO:0001672
2 2 2 5 4 4 2 2 2 7 5 5 1
GO:0001842 GO:0006001 GO:0006787 GO:0010891 GO:0010898 GO:0021670 GO:0022614 GO:0031442 GO:0033015 GO:0033483 GO:0033603 GO:0034447 GO:0034501
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0035090 GO:0035733 GO:0040016 GO:0042045 GO:0042167 GO:0042760 GO:0043476 GO:0043482 GO:0044539 GO:0045199 GO:0045541 GO:0045842 GO:0046149
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0046348 GO:0046351 GO:0050703 GO:0051315 GO:0071472 GO:0090206 GO:0090315 GO:1901970 GO:1902101 GO:1903431 GO:2000210 GO:0000041 GO:0018210
1 1 1 1 1 1 1 1 1 1 1 3 3
GO:0031100 GO:0032781 GO:0045839 GO:0051924 GO:0031145 GO:0043154 GO:0060191 GO:1904062 GO:2000021 GO:0006820 GO:0046503 GO:0050432 GO:0070542
3 2 2 5 3 3 3 5 5 8 2 2 2
GO:0072678 GO:0051297 GO:0048469 GO:0032956 GO:0031331 GO:0010948 GO:0031570 GO:0007589 GO:0042743 GO:0060761 GO:0043270 GO:0006109 GO:0010522
2 3 4 6 6 5 4 3 2 2 5 4 3
GO:1904064 GO:0045861 GO:0030029 GO:0001561 GO:0003321 GO:0006983 GO:0009635 GO:0010269 GO:0010457 GO:0010873 GO:0010889 GO:0015697 GO:0019740
3 6 11 1 1 1 1 1 1 1 1 1 1
GO:0021527 GO:0022417 GO:0030952 GO:0032372 GO:0032375 GO:0034379 GO:0034638 GO:0035434 GO:0045721 GO:0050872 GO:0051006 GO:0060586 GO:0070307
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0070345 GO:0071435 GO:0071474 GO:0072718 GO:0090166 GO:0090527 GO:0098722 GO:1901678 GO:1903236 GO:1903421 GO:1904714 GO:2000675 GO:0009108
1 1 1 1 1 1 1 1 1 1 1 1 3
GO:0014902 GO:0007077 GO:0031110 GO:0045840 GO:0055021 GO:1902930 GO:0008643 GO:0051346 GO:0015672 GO:0043254 GO:0051129 GO:0007062 GO:0030282
3 2 2 2 2 2 4 7 8 7 10 3 3
GO:2000117 GO:0006936 GO:0009894 GO:0035384 GO:0060038 GO:0071616 GO:0034220 GO:0031023 GO:0044255 GO:0072330 GO:0050790 GO:0042102 GO:0010466
3 7 9 2 2 2 13 3 14 4 30 3 5
GO:0048662 GO:0009314 GO:0002082 GO:0002430 GO:0002692 GO:0006534 GO:0006991 GO:0007635 GO:0009396 GO:0010872 GO:0010984 GO:0032025 GO:0032610
2 8 1 1 1 1 1 1 1 1 1 1 1
GO:0032933 GO:0036445 GO:0043471 GO:0046415 GO:0046541 GO:0048842 GO:0051409 GO:0051584 GO:0051639 GO:0051940 GO:0055057 GO:0060281 GO:0060315
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0060456 GO:0061365 GO:0070341 GO:0070344 GO:0071372 GO:0071459 GO:0071501 GO:0072537 GO:1900102 GO:1902669 GO:2000047 GO:0051235 GO:0006984
1 1 1 1 1 1 1 1 1 1 1 6 2
GO:0010043 GO:0030397 GO:0045123 GO:0046622 GO:0051081 GO:0048639 GO:0070509 GO:1903050 GO:0006584 GO:0009712 GO:0030433 GO:0032465 GO:0046470
2 2 2 2 2 4 4 5 2 2 2 2 2
GO:0048477 GO:0048747 GO:0050994 GO:0051784 GO:0050770 GO:0000280 GO:0022029 GO:0061383 GO:0071277 GO:0006941 GO:0008016 GO:0034599 GO:0001696
2 2 2 2 4 8 2 2 2 4 5 5 1
GO:0001956 GO:0001993 GO:0005984 GO:0006069 GO:0007168 GO:0010896 GO:0032780 GO:0033129 GO:0033160 GO:0034497 GO:0048671 GO:0060215 GO:0071281
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0090266 GO:0097286 GO:1903504 GO:1904706 GO:1904738 GO:1904752 GO:0044765 GO:0045786 GO:0006826 GO:0010524 GO:0032410 GO:0046686 GO:0051785
1 1 1 1 1 1 33 8 2 2 2 2 2
GO:0060420 GO:0034767 GO:1903052 GO:0030239 GO:0071385 GO:1901992 GO:0098660 GO:0032844 GO:0098662 GO:0045927 GO:0030036 GO:0032970 GO:0006897
2 3 4 2 2 2 10 8 9 5 9 6 10
GO:0007009 GO:0014855 GO:0021885 GO:0046068 GO:0048146 GO:1901989 GO:1904427 GO:2001251 GO:0006600 GO:0006750 GO:0010310 GO:0010763 GO:0010801
5 2 2 2 2 2 2 2 1 1 1 1 1
GO:0010866 GO:0016180 GO:0019184 GO:0019852 GO:0021516 GO:0030011 GO:0030730 GO:0031115 GO:0033700 GO:0034433 GO:0034434 GO:0034435 GO:0035067
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0035428 GO:0036500 GO:0042541 GO:0045780 GO:0046852 GO:0051280 GO:0051764 GO:0060346 GO:0061684 GO:0070493 GO:0086013 GO:0090231 GO:1900115
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1900116 GO:1902931 GO:1904659 GO:1904994 GO:2000674 GO:2001044 GO:0048513 GO:1903362 GO:0043086 GO:0001666 GO:0016202 GO:0034764 GO:0051209
1 1 1 1 1 1 38 5 12 6 3 3 3
GO:0051282 GO:0051283 GO:0051937 GO:0071320 GO:0001935 GO:0043271 GO:0051302 GO:1901861 GO:0048634 GO:0051208 GO:0097553 GO:1902656 GO:0051952
3 3 2 2 3 3 3 3 3 3 3 3 2
GO:0071384 GO:1901379 GO:2001252 GO:0036293 GO:0006067 GO:0006878 GO:0009437 GO:0010642 GO:0016540 GO:0021781 GO:0030157 GO:0030502 GO:0032530
2 2 2 6 1 1 1 1 1 1 1 1 1
GO:0032769 GO:0033605 GO:0035815 GO:0036152 GO:0042416 GO:0043206 GO:0044331 GO:0045978 GO:0046655 GO:0050812 GO:0051284 GO:0051583 GO:0051589
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0051934 GO:0060192 GO:0060347 GO:0060732 GO:0090493 GO:0090494 GO:0097050 GO:1900025 GO:1903579 GO:1904355 GO:1903169 GO:0040008 GO:0008015
1 1 1 1 1 1 1 1 1 1 3 10 9
GO:1903364 GO:0098657 GO:0072331 GO:0048285 GO:0008610 GO:0003013 GO:0043161 GO:0055024 GO:0070527 GO:0007626 GO:0033002 GO:0065009 GO:0009896
4 2 5 8 9 9 6 2 2 4 4 34 6
GO:0045859 GO:0006633 GO:0010675 GO:0051592 GO:0007049 GO:0016192 GO:0006825 GO:0009312 GO:0032354 GO:0034219 GO:0035728 GO:0035729 GO:0035810
11 3 3 3 21 18 1 1 1 1 1 1 1
GO:0042559 GO:0043457 GO:0044804 GO:0050930 GO:0051895 GO:0051953 GO:0055070 GO:0070168 GO:0070633 GO:0071285 GO:0071872 GO:0090026 GO:0090330
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:2000757 GO:0006026 GO:0016051 GO:0031214 GO:0046777 GO:0046718 GO:0007588 GO:0036503 GO:0031099 GO:0030260 GO:0044409 GO:0045766 GO:0051806
1 2 4 3 5 3 2 2 4 3 3 3 3
GO:0051828 GO:0072659 GO:0006968 GO:0015837 GO:0045844 GO:0048199 GO:0048636 GO:0051225 GO:1901863 GO:0010769 GO:0019216 GO:0001774 GO:0001921
3 4 2 2 2 2 2 2 2 5 5 1 1
GO:0006577 GO:0006895 GO:0007020 GO:0008356 GO:0010523 GO:0010919 GO:0014048 GO:0015874 GO:0019682 GO:0033197 GO:0035855 GO:0036151 GO:0042572
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0048103 GO:0048148 GO:0050685 GO:0050849 GO:0051016 GO:0051187 GO:0051590 GO:0061430 GO:0071539 GO:0071801 GO:0086011 GO:1901071 GO:1901984
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:1903392 GO:2000036 GO:0006637 GO:0009064 GO:0030500 GO:0035383 GO:0046890 GO:0050671 GO:0051238 GO:0060047 GO:0070482 GO:0018105 GO:0032946
1 1 2 2 2 2 3 3 3 5 6 5 3
GO:0055017 GO:0007517 GO:0043269 GO:0061061 GO:0003015 GO:0006000 GO:0007026 GO:0007095 GO:0007250 GO:0010971 GO:0023019 GO:0030214 GO:0030220
2 6 8 9 5 1 1 1 1 1 1 1 1
GO:0032225 GO:0033127 GO:0033158 GO:0034375 GO:0036344 GO:0045540 GO:0045830 GO:0048308 GO:0048313 GO:0050995 GO:0051580 GO:0060307 GO:0070293
1 1 1 1 1 1 1 1 1 1 1 1 1
GO:0071371 GO:0071732 GO:0071871 GO:0090208 GO:0097067 GO:0098659 GO:0099587 GO:0099625 GO:1902236 GO:0001938 GO:0036498 GO:1901655
1 1 1 1 1 1 1 1 1 2 2 2
[ reached getOption("max.print") -- omitted 1978 entries ]
plot(goDag(hgOver))
htmlReport(hgOver, file="ALL_hgco.html")
The GOHyperGParams object is a parameter object. This makes it easier to organize and execute the hypergeometric test on GO annotations for the gene set. The object has slots for the GO category (BP, MF, or CC), genes (Entrez IDs), and GO structure condition and annotation. The hyperGTest function implements the hypergeometric test using the set of parameters in the GOHyperGParams object. It computes the over or underrepresentation of the GO terms in the gene set. However, the computation ignores the GO structure, treating every annotation as independent. Here comes the use of the structure condition with the specification of the argument conditional set to TRUE in the GOHyperGParams function. It allows the use of the GO DAG structure to test the leaves of the graph, that is, those terms with no child terms. GO has a hierarchical structure that follows a DAG topology, making higher nodes more abstract than child nodes. The following table is for top-ranked GO terms enrichments. The first column is for the GO IDs; the second is for the p-value that is received from the hypergeometric test; the third is for the odds ratio; and the rest is for the expected count, actual count, size, and actual term, respectively.
KEGG is another source of pathways and function information; however, it is not available for open use in its updated form in R. It is a comprehensive database of various pathways (for example, signaling pathways, metabolic pathways, and so on). To view the database, visit the home page at http://www.genome.jp/kegg/. We can use KEGGHyperGParams instead of GOHyperGParams for the KEGG enrichment of genes. While running the function, all GOHyperGParams are replaced with KEGGHyperGParams, and the condition is set to FALSE. The analysis also needs the KEGG.db package. For more information about KEGG.db (it should be noted that the KEGG.db package has not been updated till date), visit http://www.bioconductor.org/packages/2.13/data/ annotation/manuals/KEGG.db/man/KEGG.db.pdf.
The MLP package can also be used for the enrichment of genes with various pathway databases. For details, refer to the Bioconductor home page at http://bioconductor. org/packages/devel/bioc/manuals/MLP/man/MLP.pdf. The following plot shows part of the directed acyclic graph of the GO categories in the given data:
Clustering is about aggregating similar genes together in a group (called cluster) and away from other such groups. When genes get clustered together (falling in the same group/ cluster), it means they follow a similar pattern based on the expression data under the given conditions. This recipe presents the widely used concept of hierarchical clustering in gene expression analysis.
The clustering recipe presented here will use the normalized breast cancer data from the earlier recipes. However, we will use only part of it—say, the top 1500 genes—for a faster computation
# # create your dataset for clustering purposes from the leukemia data again.
# Use only the first 100 data instances for demonstration purposes, as follows:
eset <- leukemiasEset[, sampleNames(leukemiasEset)[c(1:3, 13:15, 25:27, 49:51)]]
c.data <- exprs(eset[1:100,])
# do an array clustering. Use the following EMA package to perform clustering:
install.packages("EMA")
library(EMA)
# To perform the clustering of arrays, simply use the c.data object with
# clustering from the EMA library as follows:
c.array <- clustering(data=c.data, metric="pearson", method="ward")
# Create the dendrogram plot for the cluster by plotting the clusters as follows:
plot(c.array)
# To cluster the gene, simply transpose the data matrix and use it as input
# for the data argument in the clustering function, and define the similarity
# metric and clustering method as follows:
c.gene <- clustering(data=t(c.data), metric="pearsonabs", method="ward")
# Plot the results as follows (note that for readability issues, the following screenshot
# shows the results for only 100 genes):
plot(c.gene)
# A more detailed visualization in the form of a heatmap has been presented in the
# More visualizations for gene expression data recipe of the chapter.
The similarity measure that can be used are Pearson, Spearman correlation coefficient, Euclidean, Manhattan, and jaccard distances. Based on the similarity scores, a distance matrix is generated that is used as cluster data points. Various methods have been implemented for clustering, which include the average, single, complete, or ward method. We can also use other clustering methods such as kmeans or PAM.
The generation of networks from gene expression data has shown an upward trend in the recent past. Networks at an abstract level represent the relations between the genes based on the data. There are many possible ways to draw out these relationships from the data. In this recipe, we will explore the relations based on the correlation among the genes.
We will select a small fraction of the dataset to explain this recipe. This is simply to make the process faster and computationally less consuming. Moreover, it is good practice to reduce such networks to the more important genes, thus making the network less noisy.
The method explained in this recipe is based on the computation of the relationship between the genes in terms of correlation or similarity measures. The function computes the pairwise similarity or correlation among the genes based on the expression data and returns this as a matrix. The threshold set defines only highly correlated or similar genes connected by an edge in the network otherwise no connection between the genes. This adjacency matrix is then used to get the graph object.
Besides the WGCNA package, you can also compute the distances or correlation yourself to compute such networks. The WGCNA package makes life rather easy. There are several other methods of network inference depending on the experiment and type of data. Many of these methods are supported in R via specific R packages. It will be beyond the scope of current text to deal with these methods. You can refer to the relevant sources for detailed information
The visualizations or plots explained in this recipe are heatmaps, Venn diagrams, and volcano plots.