bioAnno An R package for building annotation package by using information from KEGG, NCBI, Ensembl and return OrgDb object such as org.Hs.eg.db.

1. Introduction

With the increasing of high throughput data generated, the requirement for having annotation package is necessary for people who want to do functional enrichment analysis, id conversion and other type related analysis. bioAnno provides wrap functions include fromKEGG, fromEnsembl, fromNCBI and fromAnnoHub to build annotation package. Making Organism Packages is a straightforward process using the helper functions fromKEGG, fromNCBI and from fromAnnoHub. Moreover, users are also allowed to make their own package based on their own annotation file by using fromOwn.

2. Software Usage

2.1 Installation

The package can be installed with following command

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("bioAnno")

2.2 Load package

library(bioAnno)

2.3 How to use it

library(bioAnno)
## build E.coli annotation package by using fromKEGG function from
## KEGG database.
fromKEGG(species="eco", install = FALSE)
## #########################################################################
## The bioAnno package downloads and uses KEGG data.Non-academic uses may
## require a KEGG license agreement (details at http://www.kegg.jp/kegg/legal.html)
## The Gene Ontology are downloaded from NCBI.
## #########################################################################
## Creating package in /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.eco.eg.db 
## ################################################################
## Please find your annotation package in ...
## /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.eco.eg.db 
## You can install it by using
## install.packages("/var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.eco.eg.db",repos = NULL,type='source') 
## ################################################################
## Here are the tables in the package org.eco.eg.db ...
## gene_info genes go go_all go_bp go_bp_all go_cc go_cc_all go_mf go_mf_all ko map_counts map_metadata metadata path 
## ################################################################
## [1] "/var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.eco.eg.db"
## which will build "org.eco.eg.db" package. The package contains
## KEGG, GO annotation. You can use install = TRUE to direct 
## install the package. 
## build from arabidopsis thaliana annotation package by using fromAnnHub 
## function
fromAnnHub(species="ath", install = FALSE)
## Creating package in /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.ath.eg.db 
## ################################################################
## Please find your annotation package in ...
## /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.ath.eg.db 
## You can install it by using
## install.packages("/var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.ath.eg.db",repos = NULL,type='source') 
## ################################################################
## Here are the tables in the package org.ath.eg.db ...
## gene_info genes go go_all go_bp go_bp_all go_cc go_cc_all go_mf go_mf_all map_counts map_metadata metadata path refseq symbol 
## ################################################################

2.4 Main Functions

fromKEGG build annotation package by extracting annotation information from Kyoto Encyclopedia of Genes and Genomes (KEGG) database. You can use KEGG species code as the query name.

fromNCBI build annotation package by extracting annotation information from NCBI database.

fromENSEMBL build annotation package by extracting annotation information fromENSEMBL database. It includes function to build annotaion package for plant with parameter plant = TRUE.

fromAnnhub build annotation package with the AnnotationHub package

getTable get annotataion table from temporary package which need user provide the temporary path

3 To use the annotation package user created

An organism level package (an ‘org’ package) you created uses a central gene identifier and contains mappings between this identifier and other kinds of identifiers. The most common interface for retrieving data is the select method.

#First make your own anntation package and loading the package
data(ath)
fromOwn(geneinfo = ath, install = TRUE)
## Please make sure you have Gene Ontology and KEGG pathway
##         or KO data.frame ready.
## Creating package in /var/folders/p_/0q7dys0x1g53nw6wypk8_qgm0000gn/T//RtmpLyWnIU/org.species.eg.db
library(org.species.eg.db)

There are 4 common methods that work together to allow a select interface. The 1st one is columns, which help you to discover which sorts of annotations can be extracted from it.

columns(org.species.eg.db)
## [1] "ENTREZID"    "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"         
## [6] "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "PATH"

The next method is keytypes which tells you the kinds of things that can be used as keys.

keytypes(org.species.eg.db)
## [1] "ENTREZID"    "EVIDENCE"    "EVIDENCEALL" "GID"         "GO"         
## [6] "GOALL"       "ONTOLOGY"    "ONTOLOGYALL" "PATH"

The third method is keys which is used to retrieve all the viable keys of a particular type.

key <- keys(org.species.eg.db,keytype="ENTREZID")

And finally there is select, which extracts data by using values supplied by the other method

result <- select(org.species.eg.db, keys=key,
columns=c("GID","GO","PATH"),keytype="ENTREZID")
## 'select()' returned 1:1 mapping between keys and columns
head(result)
##   ENTREZID       GID         GO  PATH
## 1 10723018 AT4G37553 GO:0008150 01100
## 2 10723019 AT1G27045 GO:0008150 01100
## 3 10723020 AT2G41231 GO:0008150 01100
## 4 10723022 AT5G01542 GO:0008150 01100
## 5 10723023 AT1G24095 GO:0008150 01100
## 6 10723024 AT3G02832 GO:0008150 01100

Users are also allowed to use mapIds extract gene identifiers KEGG pathway from the annotation package.

KEGG<-mapIds(org.species.eg.db,keys=key,column="PATH",keytype="ENTREZID")
head(KEGG)
## 10723018 10723019 10723020 10723022 10723023 10723024 
##  "01100"  "01100"  "01100"  "01100"  "01100"  "01100"

Or for id conversion

mapIds(org.species.eg.db,keys=key[1:10],column="GID",keytype="ENTREZID")
##    10723018    10723019    10723020    10723022    10723023    10723024 
## "AT4G37553" "AT1G27045" "AT2G41231" "AT5G01542" "AT1G24095" "AT3G02832" 
##    10723025    10723026    10723027    10723028 
## "AT4G39838" "AT1G12855" "AT4G22758" "AT3G57965"

The version number of R and packages loaded for generating the vignette were:

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] org.species.eg.db_0.0.1 AnnotationDbi_1.50.3    IRanges_2.22.2         
## [4] S4Vectors_0.26.1        Biobase_2.48.0          BiocGenerics_0.34.0    
## [7] bioAnno_0.99.32         colorout_1.2-2         
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5                    GO.db_3.11.4                 
##  [3] prettyunits_1.1.1             png_0.1-7                    
##  [5] Biostrings_2.56.0             assertthat_0.2.1             
##  [7] digest_0.6.25                 mime_0.9                     
##  [9] BiocFileCache_1.12.1          R6_2.4.1                     
## [11] RSQLite_2.2.0                 evaluate_0.14                
## [13] httr_1.4.2                    pillar_1.4.6                 
## [15] zlibbioc_1.34.0               rlang_0.4.7                  
## [17] progress_1.2.2                curl_4.3                     
## [19] data.table_1.13.0             blob_1.2.1                   
## [21] R.utils_2.10.1                R.oo_1.24.0                  
## [23] rmarkdown_2.3                 AnnotationHub_2.20.2         
## [25] stringr_1.4.0                 RCurl_1.98-1.2               
## [27] bit_4.0.4                     biomaRt_2.44.1               
## [29] shiny_1.5.0                   compiler_4.0.2               
## [31] httpuv_1.5.4                  xfun_0.17                    
## [33] askpass_1.1                   pkgconfig_2.0.3              
## [35] htmltools_0.5.0               openssl_1.4.2                
## [37] tidyselect_1.1.0              KEGGREST_1.28.0              
## [39] tibble_3.0.3                  interactiveDisplayBase_1.26.3
## [41] XML_3.99-0.5                  AnnotationForge_1.30.1       
## [43] crayon_1.3.4                  dplyr_1.0.2                  
## [45] dbplyr_1.4.4                  later_1.1.0.1                
## [47] bitops_1.0-6                  R.methodsS3_1.8.1            
## [49] rappdirs_0.3.1                jsonlite_1.7.1               
## [51] xtable_1.8-4                  lifecycle_0.2.0              
## [53] DBI_1.1.0                     magrittr_1.5                 
## [55] stringi_1.5.3                 XVector_0.28.0               
## [57] promises_1.1.1                ellipsis_0.3.1               
## [59] generics_0.0.2                vctrs_0.3.4                  
## [61] tools_4.0.2                   bit64_4.0.5                  
## [63] glue_1.4.2                    purrr_0.3.4                  
## [65] BiocVersion_3.11.1            hms_0.5.3                    
## [67] fastmap_1.0.1                 yaml_2.2.1                   
## [69] BiocManager_1.30.10           memoise_1.1.0                
## [71] knitr_1.29