bioAnno An R package for building annotation package by using information from KEGG, NCBI, Ensembl and return OrgDb object such as org.Hs.eg.db.

1. Introduction

With the increasing of high throughput data generated, the requirement for having annotation package is necessary for people who want to do functional enrichment analysis, id conversion and other type related analysis. bioAnno provides wrap functions include fromKEGG, fromEnsembl, fromNCBI and fromAnnoHub to build annotation package. Making Organism Packages is a straightforward process using the helper functions fromKEGG, fromNCBI and from fromAnnoHub. Moreover, users are also allowed to make their own package based on their own annotation file by using fromOwn.

2. Software Usage

2.1 Installation

The package can be installed with following command

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("bioAnno")

2.2 Load package

library(bioAnno)

2.3 How to use it

library(bioAnno)
## build E.coli annotation package by using fromKEGG function from
## KEGG database.
fromKEGG(species="eco", install = FALSE)
## #########################################################################
## The bioAnno package downloads and uses KEGG data.Non-academic uses may
## require a KEGG license agreement (details at http://www.kegg.jp/kegg/legal.html)
## The Gene Ontology are downloaded from NCBI.
## #########################################################################
## Creating package in /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.eco.eg.db 
## ################################################################
## Please find your annotation package in ...
## /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.eco.eg.db 
## You can install it by using
## install.packages("/var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.eco.eg.db",repos = NULL,type='source') 
## ################################################################
## Here are the tables in the package org.eco.eg.db ...
## gene_info genes go go_all go_bp go_bp_all go_cc go_cc_all go_mf go_mf_all ko map_counts map_metadata metadata path 
## ################################################################
## [1] "/var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.eco.eg.db"
## which will build "org.eco.eg.db" package. The package contains
## KEGG, GO annotation. You can use install = TRUE to direct 
## install the package. 
## build from arabidopsis thaliana annotation package by using fromAnnHub 
## function
fromAnnHub(species="ath", install = FALSE)
## Creating package in /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.ath.eg.db 
## ################################################################
## Please find your annotation package in ...
## /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.ath.eg.db 
## You can install it by using
## install.packages("/var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.ath.eg.db",repos = NULL,type='source') 
## ################################################################
## Here are the tables in the package org.ath.eg.db ...
## gene_info genes go go_all go_bp go_bp_all go_cc go_cc_all go_mf go_mf_all map_counts map_metadata metadata path refseq symbol 
## ################################################################

2.4 Main Functions

fromKEGG build annotation package by extracting annotation information from Kyoto Encyclopedia of Genes and Genomes (KEGG) database. You can use KEGG species code as the query name.

fromNCBI build annotation package by extracting annotation information from NCBI database.

fromENSEMBL build annotation package by extracting annotation information fromENSEMBL database. It includes function to build annotaion package for plant with parameter plant = TRUE.

fromAnnhub build annotation package with the AnnotationHub package

getTable get annotataion table from temporary package which need user provide the temporary path

3 To use the annotation package user created

An organism level package (an ‘org’ package) you created uses a central gene identifier and contains mappings between this identifier and other kinds of identifiers. The most common interface for retrieving data is the select method.

#First make your own anntation package and loading the package
data(ath)
fromOwn(geneinfo = ath, install = TRUE)
## Please make sure you have Gene Ontology and KEGG pathway
##         or KO data.frame ready.
## Creating package in /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//Rtmpry537R/org.species.eg.db
library(org.species.eg.db)

There are 4 common methods that work together to allow a select interface. The 1st one is columns, which tells you about what kinds of values you can retrieve as columns in the final result.

columns(org.species.eg.db)
## [1] "ENTREZID"    "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"         
## [6] "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "PATH"

The next method is keytypes which tells you the kinds of things that can be used as keys.

keytypes(org.species.eg.db)
## [1] "ENTREZID"    "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"         
## [6] "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "PATH"

The third method is keys which is used to retrieve all the viable keys of a particular type.

key <- keys(org.species.eg.db)

And finally there is select, which extracts data by using values supplied by the other method

result <- select(org.species.eg.db, keys=key,
columns=c("GO","PATH"),keytype="GID")
## 'select()' returned 1:1 mapping between keys and columns
head(result)
##         GID         GO  PATH
## 1 AT1G01010 GO:0008150 01100
## 2 AT1G01020 GO:0008150 01100
## 3 AT1G03987 GO:0008150 01100
## 4 AT1G01030 GO:0008150 01100
## 5 AT1G01040 GO:0008150 01100
## 6 AT1G03993 GO:0008150 01100