library(biomaRt)
library(dplyr)

1 BioMartデータベースのリスト listMarts

##                biomart               version
## 1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 95
## 2   ENSEMBL_MART_MOUSE      Mouse strains 95
## 3     ENSEMBL_MART_SNP  Ensembl Variation 95
## 4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 95
listMarts
biomart version host_name
ENSEMBL_MART_ENSEMBL Ensembl Genes 95 www.ensembl.org
ENSEMBL_MART_MOUSE Mouse strains 95 www.ensembl.org
ENSEMBL_MART_SNP Ensembl Variation 95 www.ensembl.org
ENSEMBL_MART_FUNCGEN Ensembl Regulation 95 www.ensembl.org
plants_mart Ensembl Plants Genes 42 plants.ensembl.org
plants_variations Ensembl Plants Variations 42 plants.ensembl.org
fungi_mart Ensembl Fungi Genes 42 fungi.ensembl.org
fungi_variations Ensembl Fungi Variations 42 fungi.ensembl.org
protists_mart Ensembl Protists Genes 42 protists.ensembl.org
protists_variations Ensembl Protists Variations 42 protists.ensembl.org
metazoa_mart Ensembl Metazoa Genes 42 metazoa.ensembl.org
metazoa_variations Ensembl Metazoa Variations 42 metazoa.ensembl.org

2 BioMartデータベース内で使用可能なデータセットを参照 listDatasets

2.1 複数のホスト及びmartのdatasetのリストをあらかじめ作成しておく

DataSet
biomart host_name dataset description version
ENSEMBL_MART_ENSEMBL www.ensembl.org acalliptera_gene_ensembl Eastern happy genes (fAstCal1.2) fAstCal1.2
ENSEMBL_MART_ENSEMBL www.ensembl.org acarolinensis_gene_ensembl Anole lizard genes (AnoCar2.0) AnoCar2.0
ENSEMBL_MART_ENSEMBL www.ensembl.org acitrinellus_gene_ensembl Midas cichlid genes (Midas_v5) Midas_v5
ENSEMBL_MART_ENSEMBL www.ensembl.org amelanoleuca_gene_ensembl Panda genes (ailMel1) ailMel1
ENSEMBL_MART_ENSEMBL www.ensembl.org amexicanus_gene_ensembl Cave fish genes (Astyanax_mexicanus-2.0) Astyanax_mexicanus-2.0
ENSEMBL_MART_ENSEMBL www.ensembl.org anancymaae_gene_ensembl Ma’s night monkey genes (Anan_2.0) Anan_2.0

2.2 データセットのリストから指定の生物種のmartを取得するuseMart

  • 指定したBioMartデータベースとデータベース内のデータセットに接続

3 データベースからデータを取得する

3.1 指定したデータセットで利用可能な属性一覧listAttributes

Attributes of mart
name description page
ensembl_gene_id Gene stable ID feature_page
ensembl_gene_id_version Gene stable ID version feature_page
ensembl_transcript_id Transcript stable ID feature_page
ensembl_transcript_id_version Transcript stable ID version feature_page
ensembl_peptide_id Protein stable ID feature_page
ensembl_peptide_id_version Protein stable ID version feature_page

3.2 データベースから指定のデータを取得する getBM

  • 以下のattr_2はbed形式に似せた形になる(実際には、5列目は’score’、)。
Attributes of mart(id retrieve)
chromosome_name ensembl_transcript_id ensembl_peptide_id external_gene_name description
MT ENSCGRT00000000001
MT ENSCGRT00000000002
MT ENSCGRT00000000003
MT ENSCGRT00000000004
MT ENSCGRT00000000005
MT ENSCGRT00000000006 ENSCGRP00000000001 ND1 NADH dehydrogenase subunit 1 [Source:NCBI gene;Acc:3979183]
Attributes of mart(bed like)
chromosome_name start_position end_position external_gene_name ensembl_transcript_id strand cds_start cds_end description
JH000064.1 2214310 2216976 ENSCGRT00000009188 1 1 67 C-type lectin domain family 10 member A-like [Source:NCBI gene;Acc:100768594]
JH000064.1 2214310 2216976 ENSCGRT00000009188 1 68 181 C-type lectin domain family 10 member A-like [Source:NCBI gene;Acc:100768594]
JH000064.1 2214310 2216976 ENSCGRT00000009188 1 182 277 C-type lectin domain family 10 member A-like [Source:NCBI gene;Acc:100768594]
JH000064.1 2214310 2216976 ENSCGRT00000009188 1 278 349 C-type lectin domain family 10 member A-like [Source:NCBI gene;Acc:100768594]
JH000064.1 2214310 2216976 ENSCGRT00000009188 1 350 421 C-type lectin domain family 10 member A-like [Source:NCBI gene;Acc:100768594]
JH000064.1 2214310 2216976 ENSCGRT00000009188 1 422 508 C-type lectin domain family 10 member A-like [Source:NCBI gene;Acc:100768594]

4 配列を取得 getSequence

4.1 配列はデータフレームで返る

4.1.1 A

cdna ensembl_transcript_id
ATGACAATTACATACGAAAACTTCCAGAACTCAGGAATCGAGGAGAAAAA ENSCGRT00000009188
ATTACATACGAAAACTTCCAGAACTCAGGAATCGAGGAGAAAAACCCAGA ENSCGRT00000009189
TCTCTGGAGAGCACAGTGGAGAAAAAGGAACAGCAATTCAAAACAGGTCT ENSCGRT00000009190

4.1.2 B

coding ensembl_transcript_id
ATTACATACGAAAACTTCCAGAACTCAGGAATCGAGGAGAAAAACCCAGA ENSCGRT00000009189
ATGACAATTACATACGAAAACTTCCAGAACTCAGGAATCGAGGAGAAAAA ENSCGRT00000009188
TCTCTGGAGAGCACAGTGGAGAAAAAGGAACAGCAATTCAAAACAGGTCT ENSCGRT00000009190

4.1.3 C

peptide ensembl_transcript_id
MTITYENFQNSGIEEKNPEIGKAAPPKSFLWDIFSWTRLLLFSLGLGLLL ENSCGRT00000009188
ITYENFQNSGIEEKNPEIGKAAPPKSFLWDIFSWTRLLLFSLGLGLLLLV ENSCGRT00000009189
SLESTVEKKEQQFKTGLSEITERVQELGKDLKALSCQLASLKNNGSAMAC ENSCGRT00000009190

5 環境

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2 dplyr_0.7.8    biomaRt_2.36.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0           highr_0.7            pillar_1.3.1        
##  [4] compiler_3.5.1       bindr_0.1.1          prettyunits_1.0.2   
##  [7] bitops_1.0-6         tools_3.5.1          progress_1.2.0      
## [10] digest_0.6.18        bit_1.1-14           tibble_2.0.0        
## [13] RSQLite_2.1.1        evaluate_0.12        memoise_1.1.0       
## [16] pkgconfig_2.0.2      rlang_0.3.1          DBI_1.0.0           
## [19] curl_3.2             yaml_2.2.0           parallel_3.5.1      
## [22] stringr_1.3.1        httr_1.3.1           knitr_1.20          
## [25] S4Vectors_0.18.3     IRanges_2.14.12      hms_0.4.2           
## [28] tidyselect_0.2.5     stats4_3.5.1         rprojroot_1.3-2     
## [31] bit64_0.9-7          glue_1.3.0           Biobase_2.40.0      
## [34] R6_2.3.0             AnnotationDbi_1.42.1 XML_3.98-1.16       
## [37] rmarkdown_1.10       purrr_0.2.5          blob_1.1.1          
## [40] magrittr_1.5         backports_1.1.2      htmltools_0.3.6     
## [43] BiocGenerics_0.26.0  assertthat_0.2.0     stringi_1.2.4       
## [46] RCurl_1.95-4.11      crayon_1.3.4