1. CTD

提供药物-疾病-靶基因数据 use all backgroud database

http://ctdbase.org/tools/batchQuery.go search COVID-19 批量下载,ALL, Infered, Curated. 数量对不上。可能需要根据InferenceScore筛选。

http://ctdbase.org/detail.go?type=disease&acc=MESH%3AD000086382&view=gene 下载回来没有GENE信息

最后使用 http://ctdbase.org/downloads/#cd 完整后台数据库下载

药物ID使用MeshID与PubChem相同, 疾病ID使用MeshID 或 omimID,但是对于COVID-19没有记录

## [1] 14448

2. DisGeNET

https://www.disgenet.org/covid/genes/summary/

病毒与宿主蛋白相互作用(但没有提及具体是哪个病毒蛋白,似乎没有),没有下载链接,由Download页面下载回来的数据是统计之后的结果,例如疾病关联基因的数目等。

TO DISCUSS.

该数据库提供 https://www.disgenet.org/static/disgenet_ap1/files/downloads/readme.txt For Gene-Disease associations:

disease_associations.tsv.gz => Diseases associated to genes from DisGeNET

The columns in the files are: diseaseId -> UMLS concept unique identifier diseaseName -> Name of the disease
diseaseType -> The DisGeNET disease type: disease, phenotype and group diseaseClass -> The MeSH disease class(es) diseaseSemanticType -> The UMLS Semantic Type(s) of the disease NofGenes -> Number of genes associated to the disease NofPmids -> Number of publications associated to the disease

gene_associations.tsv.gz => Genes associated to Diseases from DisGeNET

The columns in the files are: geneId -> NCBI Entrez Gene Identifier geneSymbol -> Official Gene Symbol DSI -> The Disease Specificity Index for the gene DPI -> The Disease Pleiotropy Index for the gene PLI -> The probability for the gene of being loss-of-function intolerant, provided by the GNOMAD consortium protein_class -> Protein Class identifier according to the Drug Target Ontology
protein_class_name -> Protein Class according to the Drug Target Ontology
NofDiseases -> Number of diseases associated to the gene NofPmids -> Number of publications associated to the gene

For Variant-Disease associations:

disease_associations.tsv.gz => Diseases associated to variants from DisGeNET

The columns in the files are: diseaseId -> UMLS concept unique identifier diseaseName -> Name of the disease
diseaseType -> The DisGeNET disease type: disease, phenotype and group diseaseClass -> The MeSH disease class(es) diseaseSemanticType -> The UMLS Semantic Type(s) of the disease NofSnps -> Number of variants associated to the disease NofPmids -> Total number of publications reporting the Variant-Disease association

variant_associations.tsv.gz => Variants associated to diseases from DisGeNET

The columns in the files are: snpId -> dbSNP variant Identifier chromosome -> Chromosome of the variant position -> Position in chromosome DSI -> The Disease Specificity Index for the variant DPI -> The Disease Pleiotropy Index for the variant NofDiseases -> Number of diseases associated to the variant NofPmids -> Total number of publications reporting the Variant-Disease association

disease_mappings.tsv.gz => Mappings from UMLS concept unique identifier to disease vocabularies: DO, EFO, HPO, ICD9CM, MSH, NCI, OMIM, and ORDO

variant_to_gene_mappings.tsv.gz => Variant mapped to their corresponding genes, according to dbSNP.

The columns in the files are: snpId -> dbSNP variant Identifier geneId -> NCBI Entrez Gene Identifier geneSymbol -> Official Gene Symbol

3. Drugbank

https://go.drugbank.com/covid-19

Drug and Target的信息,393可能需要从网页直接下载,后台数据库不一定有。

4. GeneCards

https://www.genecards.org/Search/Keyword?queryString=covid-19

只显示了关联人的基因列表,亦无相互作用信息,不用?

5. PubChem

https://pubchem.ncbi.nlm.nih.gov/#query=covid-19&tab=gene

仅为SARS-Cov-2病毒的基因列表,不用?

6. TTD Therapeutic Target Database

http://db.idrblab.net/ttd/search/ttd/covid19-target?search_api_fulltext=COVID-19

包括的人的Taget蛋白对应药物的信息,可是只知道是target,而不知道是病毒哪个蛋白的target?

可以作为Drugbank drug-target 的扩展补充

7. NCBI gene

https://www.ncbi.nlm.nih.gov/gene/?term=covid-19

仅病毒蛋白的列表,似乎应用就是不同数据库的蛋白列表汇总到一起,有可能个别数据库不全? 不用

8. OMIM

https://omim.org/search?index=entry&search=COVID-19&start=1&limit=10&retrieve=geneMap&genemap_exists=true

OMIM中新增COVID记录,仅为人基因列表,未提供与病毒蛋白的联系信息。可作为扩展网络

9. PharmGKB

https://www.pharmgkb.org/disease/PA166197121/related#genes

仅提供人的基因列表,亦无具体与病毒蛋白的联系信息。同上?

可下载的其他数据

Variant, Gene and Drug Relationship Data

Relationships summarized from PharmGKB annotations.

Clinical Guideline Annotations

Detailed clinical guideline annotations in JSON(opens in new window) format:

Drug Label Annotations

Drug label annotations in TSV(opens in new window) format:

Clinical Variant Data

This file contains a list of variant-drug pairs and level of evidence for all clinical annotations in TSV(opens in new window) format:

Reactome

Protein-Protein Interactions derived from Reactome pathways full dataset.

目前已写好代码将BioPax 3 转换为wide table形式 pathway has_component ; reaction has_input …; reaction has_output …

## [1] 103285      9

Reactome

a pathway Regulation of Apoptosis Biopax file parse

https://reactome.org/content/detail/R-HSA-169911

## 
indexed 0B in  0s, 0B/s
indexed 1.00TB in  0s, 1.69PB/s
                                                                              

indexed 0B in  0s, 0B/s
indexed 1.00TB in  0s, 597.05TB/s
                                                                              
## [1] "3"
## [1] "Regulation of ApoptosisA regulated balance between cell survival and apoptosis is essential for normal\r\ndevelopment and homeostasis of multicellular organisms  (see Matsuzawa, 2001).  Defects in control of this balance may contribute  to autoimmune disease, neurodegeneration and cancer.  Protein ubiquitination and degradation is one of the major mechanisms that regulate apoptotic cell death (reviewed in Yang and Yu 2003).Authored: Jakobi, R, 2008-02-05 11:04:14Reviewed: Chang, E, 2008-05-21 00:05:41Edited: Matthews, L, 2008-02-12 16:13:24Edited: Matthews, L, 2008-06-12 00:23:53"                                                                                                             
## [2] "Regulation of activated PAK-2p34 by proteasome mediated degradationStimulation of cell death by PAK-2  requires the generation and stabilization of the caspase-activated form, PAK-2p34 (Walter et al., 1998;Jakobi et al., 2003).  Levels of proteolytically activated PAK-2p34 protein are controlled by ubiquitin-mediated proteolysis. PAK-2p34 but not full-length PAK-2 is degraded  by the 26 S proteasome (Jakobi et al., 2003). It is  not known whether ubiquitination and degradation of PAK-2p34 occurs in the cytoplasm or in the nucleus.Authored: Jakobi, R, 2008-02-05 11:04:14Reviewed: Chang, E, 2008-05-21 00:05:41Edited: Matthews, L, 2008-02-03 20:50:13Edited: Matthews, L, 2008-06-12 00:23:53"
## [3] "Regulation of PAK-2p34 activity by PS-GAP/RHG10PS-GAP (RGH10) interacts specifically with caspase-activated PAK-2p34 reducing the ability of PAK-2p34 to induce cell death. This interaction inhibits the kinase activity of PAK-2p34 and changes the localization of PAK-2p34 from the nucleus to the perinuclear  region (Koeppel et al., 2004).Authored: Jakobi, R, 2008-02-05 11:04:14Reviewed: Chang, E, 2008-05-21 00:05:41Edited: Matthews, L, 2008-02-03 20:50:13Edited: Matthews, L, 2008-06-12 00:23:53"
##  [1] "A regulated balance between cell survival and apoptosis is essential for normal\r\ndevelopment and homeostasis of multicellular organisms  (see Matsuzawa, 2001).  Defects in control of this balance may contribute  to autoimmune disease, neurodegeneration and cancer.  Protein ubiquitination and degradation is one of the major mechanisms that regulate apoptotic cell death (reviewed in Yang and Yu 2003)."                                                                 
##  [2] "Authored: Jakobi, R, 2008-02-05 11:04:14"                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [3] "Reviewed: Chang, E, 2008-05-21 00:05:41"                                                                                                                                                                                                                                                                                                                                                                                                                                              
##  [4] "Edited: Matthews, L, 2008-02-12 16:13:24"                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [5] "Edited: Matthews, L, 2008-06-12 00:23:53"                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [6] "Stimulation of cell death by PAK-2  requires the generation and stabilization of the caspase-activated form, PAK-2p34 (Walter et al., 1998;Jakobi et al., 2003).  Levels of proteolytically activated PAK-2p34 protein are controlled by ubiquitin-mediated proteolysis. PAK-2p34 but not full-length PAK-2 is degraded  by the 26 S proteasome (Jakobi et al., 2003). It is  not known whether ubiquitination and degradation of PAK-2p34 occurs in the cytoplasm or in the nucleus."
##  [7] "Authored: Jakobi, R, 2008-02-05 11:04:14"                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [8] "Reviewed: Chang, E, 2008-05-21 00:05:41"                                                                                                                                                                                                                                                                                                                                                                                                                                              
##  [9] "Edited: Matthews, L, 2008-02-03 20:50:13"                                                                                                                                                                                                                                                                                                                                                                                                                                             
## [10] "Edited: Matthews, L, 2008-06-12 00:23:53"                                                                                                                                                                                                                                                                                                                                                                                                                                             
## [11] "PS-GAP (RGH10) interacts specifically with caspase-activated PAK-2p34 reducing the ability of PAK-2p34 to induce cell death. This interaction inhibits the kinase activity of PAK-2p34 and changes the localization of PAK-2p34 from the nucleus to the perinuclear  region (Koeppel et al., 2004)."                                                                                                                                                                                  
## [12] "Authored: Jakobi, R, 2008-02-05 11:04:14"                                                                                                                                                                                                                                                                                                                                                                                                                                             
## [13] "Reviewed: Chang, E, 2008-05-21 00:05:41"                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [14] "Edited: Matthews, L, 2008-02-03 20:50:13"                                                                                                                                                                                                                                                                                                                                                                                                                                             
## [15] "Edited: Matthews, L, 2008-06-12 00:23:53"
##                  class                   id property property_attr
## 1:           BioSource           BioSource1     name  rdf:datatype
## 2:           BioSource           BioSource1     xref  rdf:resource
## 3: BiochemicalReaction BiochemicalReaction1  comment  rdf:datatype
## 4: BiochemicalReaction BiochemicalReaction1  comment  rdf:datatype
## 5: BiochemicalReaction BiochemicalReaction1  comment  rdf:datatype
## 6: BiochemicalReaction BiochemicalReaction1  comment  rdf:datatype
##                        property_attr_value
## 1: http://www.w3.org/2001/XMLSchema#string
## 2:                       #UnificationXref2
## 3: http://www.w3.org/2001/XMLSchema#string
## 4: http://www.w3.org/2001/XMLSchema#string
## 5: http://www.w3.org/2001/XMLSchema#string
## 6: http://www.w3.org/2001/XMLSchema#string
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   property_value
## 1:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Homo sapiens
## 2:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## 3: PAK-2p34 is ubiquitinated prior to degradation (Jakobi et al., 2003). Here, ubiquitination of PAK-2p34 is described as occurring in the cytosol. However, to date it is  not known whether this occurs in the nucleus or in the cytoplasm. Evidence for this reaction comes from experiments using  both human and rabbit proteins. The polyubiquitin synthesized in the reaction is inferred to contain lysine-48 (K48) linkages because the modified protein is targeted to the proteasome (Komander 2009).
## 4:                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Authored: Jakobi, R, 2008-02-05 11:04:14
## 5:                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Reviewed: Chang, E, 2008-05-21 00:05:41
## 6:                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Edited: Matthews, L, 2008-02-03 20:50:13
## [1] "Regulation of Apoptosis"

## [1] "BiochemicalReaction1" "BiochemicalReaction2" "BiochemicalReaction3"
## [4] "BiochemicalReaction4" "BiochemicalReaction5"
##          property_value
## 1: OMA1 hydrolyses OPA1
##       class        id         property property_attr
##  1: Protein Protein17 cellularLocation  rdf:resource
##  2: Protein Protein17          comment  rdf:datatype
##  3: Protein Protein17       dataSource  rdf:resource
##  4: Protein Protein17      displayName  rdf:datatype
##  5: Protein Protein17  entityReference  rdf:resource
##  6: Protein Protein17          feature  rdf:resource
##  7: Protein Protein17          feature  rdf:resource
##  8: Protein Protein17          feature  rdf:resource
##  9: Protein Protein17             name  rdf:datatype
## 10: Protein Protein17             xref  rdf:resource
## 11: Protein Protein17             xref  rdf:resource
##                         property_attr_value                     property_value
##  1:            #CellularLocationVocabulary1                                   
##  2: http://www.w3.org/2001/XMLSchema#string            Reactome DB_ID: 3095873
##  3:                            #Provenance1                                   
##  4: http://www.w3.org/2001/XMLSchema#string          K48polyUb-p-T402-PAK-2p43
##  5:                      #ProteinReference1                                   
##  6:                   #ModificationFeature1                                   
##  7:                   #ModificationFeature2                                   
##  8:                      #FragmentFeature16                                   
##  9: http://www.w3.org/2001/XMLSchema#string K48polyUb-phospho-PAK-2p34(Thr402)
## 10:                      #UnificationXref42                                   
## 11:                      #UnificationXref43
##    property_value
## 1:           OPA1
##    property_value
## 1:           OPA1
## 2:            H2O
##                                                          property_value
## 1:                                                         OPA1(88-194)
## 2: Dynamin-like 120 kDa protein, mitochondrial ecNumber3.6.5.5/ecNumber
## 3:                                                        OPA1(195-960)
## 4: Dynamin-like 120 kDa protein, mitochondrial ecNumber3.6.5.5/ecNumber

View Reactome neo4j database graphdb

/etc/init.d/neo4j restart http://127.0.0.1:7474

KEGG

not provide full dataset download

a single pathway Apoptosis KGML to data.frame

可以形成包括激活 抑制信息的蛋白相互作用表

KEGG

download hsa 347 kegg kgml then in to interaction network

## [1] 124718      5

ConsensusPathDB

ConsensusPathDB_human_PPI network in tab-delimited format and PSI-MI (level 2.5) format CPDB_pathways_genes txt file http://cpdb.molgen.mpg.de/

## [1] 554471      9
## [1] 4681    4

BioGRID full dataset

Biogrid-all-mitab PSI-MI 2.5 XML to table

仅相互作用没有方向

## [1] 2407713      15

BioGRID Corona-virus tab file

BIOGRID-CORONAVIRUS-4.4.212.tab3.txt

## [1] 35076    37

BioGRID Corona-virus update data format and structure 20220901

https://downloads.thebiogrid.org/File/BioGRID/Latest-Release/BIOGRID-PROJECT-covid19_coronavirus_project-LATEST.zip

## [1] 29710    37

IntAct

https://www.ebi.ac.uk/intact/home

https://www.ebi.ac.uk/intact/search?query=annot:%22dataset:coronavirus%22

页面呈现与PSI-MI TAB 一致,但是不能下载,仅提供PSIMI XML3格式

use PSI-MI TAB format FULL dataset

## [1] 1194447      42

解析IntAct COVID-19 PSIMI XML2.5

The data primarily covers protein-protein and several RNA-protein interactions involving SARS-CoV2 and SARS-CoV. All interactions from the relevant publications are covered in this dataset, including interactions with other organism.

## 1 Entries found
## Parsing entry 1 
##   Parsing experiments: ..
##   Parsing interactors:
## 
  14% ======>
  29% ============>
  43% =================>
  57% =======================>
  71% ============================>
  86% ==================================>
  100% ========================================>
##   Parsing interactions:
## ......
## 1 Entries found
## Parsing entry 1 
##   Parsing experiments: ..
##   Parsing interactors:
## 
  14% ======>
  29% ============>
  43% =================>
  57% =======================>
  71% ============================>
  86% ==================================>
  100% ========================================>
##   Parsing interactions:
## ......