以下代码需要加载 GSEABase 包:
读取gmt文件看看GeneSetCollection 对象
自己去下载 h.all.v7.2.symbols.gmt 文件哦,在msigdb数据库可以找到。
代码如下:
gmtfile ='../MSigDB/symbols/h.all.v7.2.symbols.gmt'
geneset <- getGmt( gmtfile )
geneset
得到了 GeneSetCollection 对象 :
> geneset
GeneSetCollection
names: HALLMARK_TNFA_SIGNALING_VIA_NFKB, HALLMARK_HYPOXIA, ..., HALLMARK_PANCREAS_BETA_CELLS (50 total)
unique identifiers: JUNB, CXCL2, ..., SRP14 (4383 total)
types in collection:
geneIdType: NullIdentifier (1 total)
collectionType: NullCollection (1 total)
起初看到 GeneSetCollection 对象肯定是一脸懵逼的,不过我们可以慢慢来熟悉这个 GeneSetCollection 对象哦。
看看 GeneSetCollection 的帮助文档
GeneSetCollection-methods {GSEABase} R Documentation
Methods to construct GeneSetCollection instances
Description
Use GeneSetCollection to construct a collection of gene sets from GeneSet arguments, or a list of GeneSets.
Usage
GeneSetCollection(object, ..., idType, setType)
附带一个很精彩的例子:
## Not run:
## from KEGG identifiers, for example
library(KEGG.db)
lst <- head(as.list(KEGGEXTID2PATHID))
gsc <- GeneSetCollection(mapply(function(geneIds, keggId) {
GeneSet(geneIds, geneIdType=EntrezIdentifier(),
collectionType=KEGGCollection(keggId),
setName=keggId)
}, lst, names(lst)))
也就是说,只需要自己构建好一个list,如下所示:
> lst
$`10`
[1] "hsa00232" "hsa00983" "hsa01100"
$`100`
[1] "hsa00230" "hsa01100" "hsa05340"
$`1000`
[1] "hsa04514" "hsa05412"
就可以很方便的构建好我们需要的 GeneSetCollection 对象 :
> gsc
GeneSetCollection
names: 10, 100, ..., 100000110 (6 total)
unique identifiers: hsa00232, hsa00983, ..., dre04080 (44 total)
types in collection:
geneIdType: EntrezIdentifier (1 total)
collectionType: KEGGCollection (1 total)
有意思的是,作者这里举例,居然是把基因作为基因集,但是kegg的pathway当做是基因来处理,不过问题不大哦。
那么我们自己创造
首先需要自己拿到感兴趣的基因集,我这里呢,用线粒体核糖体基因举例:
library(org.Hs.eg.db)
s=unique(toTable(org.Hs.egSYMBOL)[,2])
head(s)
mt.genes <- s[grep("^MT",s)];
head(mt.genes) # 这里有问题
rb.genes <- s[grep("^RP[SL]",s)];
head(rb.genes)
lst=list(mt.genes=mt.genes,
rb.genes=rb.genes )
gsc <- GeneSetCollection(mapply(function(geneIds, keggId) {
GeneSet(geneIds, geneIdType=EntrezIdentifier(),
collectionType=KEGGCollection(keggId),
setName=keggId)
}, lst, names(lst)))
gsc
得到如下所示的:
> gsc
GeneSetCollection
names: mt.genes, rb.genes (2 total)
unique identifiers: MTOR, MT1A, ..., RPS6KB2-AS1 (2666 total)
types in collection:
geneIdType: EntrezIdentifier (1 total)
collectionType: KEGGCollection (1 total)