十一 23



本来还以为需要自己上传自己的基因给这个数据库去做分析,没想到他们也开发了R包,主页见: http://www.bioconductor.org/packages/release/bioc/html/STRINGdb.html 而我比较喜欢用编程来解决问题,所以就学了一下这个包,非常好用!
它只需要一个3列的data.frame,分别是logFC,p.value,gene ID,就是标准的差异分析的结果。
然后用string_db$map函数给它加上一列是 string 数据库的蛋白ID,然后用string_db$add_diff_exp_color函数给它加上一列是color。
用string_db$plot_network函数画网络图,只需要 string 数据库的蛋白ID,如果需要给蛋白标记不同的颜色,需要用string_db$post_payload来把color对应到每个蛋白,然后再画网络图。

Continue reading



string数据库是PPI领域里面最完备已经最受欢迎的数据库了。如果直接在谷歌里面搜索PPI,映入眼帘就是string的官网,它们的主页现在是html5啦,比较精美: http://string-db.org/








> tmp=toTable(org.Hs.egENSEMBLPROT)
> dim(tmp)
[1] 110916      2
> head(tmp)
  gene_id         prot_id
1       1 ENSP00000263100
2       1 ENSP00000470909
3       2 ENSP00000443302
4       2 ENSP00000323929
5       2 ENSP00000438599
6       2 ENSP00000445717






最近遇到一个项目需要探究一个gene list里面的基因直接的联系,所以就想到了基因的产物蛋白的相互作用关系数据库,发现这些数据库好多好多!
一个比较综合的链接是:A compendium of PPI databases can be found in http://www.pathguide.org/.


Your search returned 207 results in 9 categories with the following search parameters:

人类的六个主要PPI是:Analysis of human interactome PPI data showing the coverage of six major primary databases (BIND, BioGRID, DIP, HPRD, IntAct, and MINT), according to the integration provided by the meta-database APID.
BIND the biomolecular interaction network database died link
DIP the database of interacting proteins http://dip.doe-mbi.ucla.edu/ 
MINT the molecular interaction database http://mint.bio.uniroma2.it/mint/ 
STRING Search Tool for the Retrieval of Interacting Genes/Proteins http://string-db.org/  
HPRO Human protein reference database http://www.hprd.org/ 
BioGRID The Biological General Repository for Interaction Datasets http://thebiogrid.org/ 
(a) PPI definition; a definition of a protein-to-protein interaction compared to other biomolecular relationships or associations.
(b)PPI determination by two alternative approaches: binary and co-complex; a description of the PPIs determined by the two main types of experimental technologies.
(c) The main databases and repositories that include PPIs; a description and comparison of the main databases and repositories that include PPIs, indicating the type of data that they collect with a special distinction between experimental and predicted data.
(d) Analysis of coverage and ways to improve PPI reliability; a comparative study of the current coverage on PPIs and presentation of some strategies to improve the reliability of PPI data.
(e) Networks derived from PPIs compared to canonical pathways; a practical example that compares the characteristics and information provided by a canonical pathway and the PPI network built for the same proteins. Last, a short summary and guidance for learning more is provided.
There are four common approaches for PPI data expansions:
1) manual curation from the biomedical literature by experts;
2) automated PPI data extraction from biomedical literature with text mining methods;
3) computational inference based on interacting protein domains or co-regulation relationships, often derived from data in model organisms; and
4) data integration from various experimental or computational sources.
Partly due to the difficulty of evaluating qualities for PPI data, a majority of widely-used PPI databases, including DIP, BIND, MINT, HPRD, and IntAct, take a "conservative approach" to PPI data expansion by adding only manually curated interactions. Therefore, the coverage of the protein interactome developed using this approach is poor.
In the second literature mining approach, computer software replaces database curators to extract protein interaction (or, association) data from large volumes of biomedical literature . Due to the complexity of natural language processing techniques involved, however, this approach often generates large amount of false positive protein "associations" that are not truly biologically significant "interactions".
The challenge for the integrative approach is how to balance quality with coverage.
In particular, different databases may contain many redundant PPI information derived from the same sources, while the overlaps between independently derived PPI data sets are quite low .
2009年发表的HIPPI数据库:http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-S1-S16#CR6_2544 (是对HPRD [11], BIND [20], MINT [21], STRING [26], and OPHID数据库的整合)