学习使用genefu
这个包,首先需要安装它!
source("http://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("genefu")
教程略微有点复杂:https://rdrr.io/bioc/genefu/f/inst/doc/genefu.pdf
这个包自带了5个乳腺癌芯片数据集,需要了解他们,可以看我在生信技能树发的教程:https://vip.biotrainee.com/d/689-5
包里面自带的数据集也非常多,可以在https://rdrr.io/bioc/genefu/man/ 查看列表,比较重要的如下:
gene70 | Function to compute the 70 genes prognosis profile (GENE70)… |
---|---|
gene76 | Function to compute the Relapse Score as published by Wang et… |
mod1 | Gene modules published in Desmedt et al. 2008 |
---|---|
mod2 | Gene modules published in Wirapati et al. 2008 |
scmgene.robust | Subtype Clustering Model using only ESR1, ERBB2 and AURKA… |
---|---|
scmod1.robust | Subtype Clustering Model using ESR1, ERBB2 and AURKA modules… |
scmod2.robust | Subtype Clustering Model using ESR1, ERBB2 and AURKA modules… |
sig.endoPredict | Signature used to compute the endoPredict signature as… |
---|---|
sig.gene70 | Signature used to compute the 70 genes prognosis profile… |
sig.gene76 | Signature used to compute the Relapse Score (GENE76) as… |
sig.genius | Gene Expression progNostic Index Using Subtypes (GENIUS) as… |
sig.ggi | Gene expression Grade Index (GGI) as published in Sotiriou et… |
sig.oncotypedx | Signature used to compute the OncotypeDX signature as… |
pam50 | PAM50 classifier for identification of breast cancer… |
---|---|
pik3cags | Function to compute the PIK3CA gene signature (PIK3CA-GS) |
上面列出的这些数据集都是可以打开看的:
library(genefu)
data(pam50)
data(pam50.scale)
data(pam50.robust)
data(scmod2.robust)
pam50
str(scmod2.robust, max.level=1)
最重要的功能就是根据已知基因集来对乳腺癌进行分子分型
所有的分型都是用molecular.subtyping
函数,预背了很多可以进行乳腺癌进行分子分型的基因集,比如大名鼎鼎的PAM50,下面是演示:
rm(list = ls())
library(breastCancerMAINZ)
library(breastCancerTRANSBIG)
library(breastCancerUPP)
library(breastCancerUNT)
library(breastCancerNKI)
data(transbig)
dd=transbig
cinfo <- colnames(pData(dd))
cinfo
ddata <- t(exprs(dd))
dannot <- featureData(dd)@data
ddemo <- phenoData(dd)@data
library(genefu)
SubtypePredictions<-molecular.subtyping(sbt.model = "scmod2",data = ddata,
annot = dannot,do.mapping = TRUE)
table(SubtypePredictions$subtype)
ddemo$SCMOD2<-SubtypePredictions$subtype
PAM50Preds<-molecular.subtyping(sbt.model = "pam50",data=ddata,
annot=dannot,do.mapping=TRUE)
table(PAM50Preds$subtype)
ddemo$PAM50<-PAM50Preds$subtype
分类结果如下:
> table(SubtypePredictions$subtype)
ER+/HER2- High Prolif ER+/HER2- Low Prolif ER-/HER2- HER2+
50 77 49 22
> table(PAM50Preds$subtype)
Basal Her2 LumB LumA Normal
45 26 45 78 4
需要懂这个数据集才能理解,还需要有一点乳腺癌研究背景知识。
成功分类后的信息,就可以用来做生存分析
# http://www.inside-r.org/r-doc/survival/survfit.coxph
library(survival)
data.for.survival.SCMOD2 <- ddemo[,c("e.os", "t.os", "SCMOD2","age")]
data.for.survival.PAM50 <- ddemo[,c("e.os", "t.os", "PAM50","age")]
# Remove patients with missing survival information
data.for.survival.SCMOD2 <- data.for.survival.SCMOD2[complete.cases(data.for.survival.SCMOD2),]
data.for.survival.PAM50 <- data.for.survival.PAM50[complete.cases(data.for.survival.PAM50),]
days.per.month <- 30.4368
days.per.year <- 365.242
data.for.survival.PAM50$months_to_death <- data.for.survival.PAM50$t.os / days.per.month
data.for.survival.PAM50$vital_status <- data.for.survival.PAM50$e.os == "1"
surv.obj.PAM50 <- survfit(Surv(data.for.survival.PAM50$months_to_death,
data.for.survival.PAM50$vital_status) ~ data.for.survival.PAM50$PAM50)
data.for.survival.SCMOD2$months_to_death <- data.for.survival.SCMOD2$t.os / days.per.month
data.for.survival.SCMOD2$vital_status <- data.for.survival.SCMOD2$e.os == "1"
surv.obj.SCMOD2 <- survfit(Surv(
data.for.survival.SCMOD2$months_to_death,
data.for.survival.SCMOD2$vital_status) ~ data.for.survival.SCMOD2$SCMOD2)
message("KAPLAN-MEIR CURVE - USING PAM50")
plot(main = "Surival Curves SCMOD2", surv.obj.SCMOD2,
col =c("#006d2c", "#8856a7","#a50f15", "#08519c"),lty = 1,lwd = 3,
xlab = "Time (months)",ylab = "Probability of Survival")
legend("topright",fill = c("#006d2c", "#8856a7","#a50f15", "#08519c"),
legend = c("Basal","Her2","LumA","LumB"),bty = "n")
plot(main = "Surival Curves PAM50", surv.obj.PAM50,
col =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),lty = 1,lwd = 3,
xlab = "Time (months)",ylab = "Probability of Survival")
legend("topright",
fill = c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),
legend = c("Basal","Her2","LumA","LumB","Normal"),bty = "n")
也可以把这两个分类画在一张图上,或者添加统计学显著指标,你们自己尝试一下哦,我在生信技能树论坛也多次提到过,就不再赘述咯。
提示一下,可以使用 Cross-validated Partial Likelihood (cvpl) 模型来检验两个分类方法的预后判断情况。
更多分类标准
前面我们提到过,这个包最大的优点就是内置了一系列分类指标,如下;
Subtype Clustering Model using just the AURKA gene: scmgene.robust()
Subtype Clustering Model using just the ESR1 gene: scmgene.robust() Subtype Clustering Model using just the ERBB2 gene: scmgene.robust()
NPI: npi()
GGI: ggi()
GENIUS: genius()
EndoPredict: endoPredict()
OncotypeDx: oncotypedx()
TamR: tamr()
GENE70: gene70()
PIK3CA: pik3cags()
rorS: rorS()
基本上是,可以想怎么分类,就怎么分类了。