去broad官网下载msigdb数据库文件很麻烦

我在：借鉴escape包的一些可视化GSVA或者ssGSEA结果矩阵的方法和对单细胞表达矩阵做gsea分析的两个教程里面提到过，MSigDB（Molecular Signatures Database）数据库中定义了已知的基因集合：http://software.broadinstitute.org/gsea/msigdb 需要注册才能下载。

但是这个GitHub包，ncborcherding/escape文档，在：http://www.bioconductor.org/packages/release/bioc/vignettes/escape/inst/doc/vignette.html 提供了一个封装好的MSigDB数据库信息，其实你仔细看它的文档，它的打包其实是依赖于msigdbr_7.2.1。

获取 MigDB中的全部基因集

MigDB中的全部基因集都被这个GitHub包，ncborcherding/escape 打包起来了，MSigDB（Molecular Signatures Database）数据库中定义了已知的基因集合：http://software.broadinstitute.org/gsea/msigdb 包括H和C1-C7八个系列（Collection），每个系列分别是：

H: hallmark gene sets （癌症）特征基因集合，共50组，最常用；
C1: positional gene sets 位置基因集合，根据染色体位置，共326个，用的很少；
C2: curated gene sets：（专家）校验基因集合，基于通路、文献等：
C3: motif gene sets：模式基因集合，主要包括microRNA和转录因子靶基因两部分
C4: computational gene sets：计算基因集合，通过挖掘癌症相关芯片数据定义的基因集合；
C5: GO gene sets：Gene Ontology 基因本体论，包括BP（生物学过程biological process，细胞原件cellular component和分子功能molecular function三部分）
C6: oncogenic signatures：癌症特征基因集合，大部分来源于NCBI GEO 发表芯片数据
C7: immunologic signatures: 免疫相关基因集合。

GS <- getGeneSets(library = "H")
GS

MigDB中的全部基因集被构建成为： a list of GSEABase GeneSet objects ，获取 hallmark gene sets （癌症）特征基因集合。

源头是msigdbr 包

安装方法非常简单：

install.packages("msigdbr")

但是这个msigdbr并没有我想象中的那么大：

Installing package into ‘C:/Users/win10/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
试开URL’https://cran.rstudio.com/bin/windows/contrib/4.0/msigdbr_7.2.1.zip'
Content type 'application/zip' length 6737651 bytes (6.4 MB)
downloaded 6.4 MB

package ‘msigdbr’ successfully unpacked and MD5 sums checked

同样的，学习R包，看看文档即可，在： https://cran.r-project.org/web/packages/msigdbr/vignettes/msigdbr-intro.html

Documentation for package ‘msigdbr’ version 7.2.1
DESCRIPTION file.
User guides, package vignettes and other documentation.
Help Pages
msigdbr Retrieve the gene sets data frame
msigdbr_collections List the collections available in the msigdbr package
msigdbr_show_species List the species available in the msigdbr package
msigdbr_species List the species available in the msigdbr package

非常简单的文档

这些代码使用就明白了，确实没啥好继续讲解的：

library(msigdbr)
# All gene sets in the database can be retrieved without specifying a collection/category. 
all_gene_sets = msigdbr(species = "Mus musculus")
head(all_gene_sets)
msigdbr_species()
all_gene_sets = msigdbr(species = "Homo sapiens")

无非就是封装和对象，前面的 escape 包提供了getGeneSets函数，我们的这个msigdbr提供了 msigdbr函数。

生信基石之R语言

B站的10个小时教学视频务必看完，参考 GitHub 仓库存放的相关学习路线指导资料：https://github.com/jmzeng1314/R_bilibili ，可以参考一些优秀笔记，比如https://mubu.com/doc/2KUiSCfVsg

初级10 个题目：http://www.bio-info-trainee.com/3793.html
中级要求是：http://www.bio-info-trainee.com/3750.html
高级要求是完成20题： http://www.bio-info-trainee.com/3415.html
统计专题 30题：http://www.bio-info-trainee.com/4385.html
可视化专题30题：http://www.bio-info-trainee.com/4387.html

一	二	三	四	五	六	日
« 九
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

去broad官网下载msigdb数据库文件很麻烦

获取 MigDB中的全部基因集

源头是msigdbr 包

生信基石之R语言

文末友情推荐