乳腺癌是高度异质性疾病,临床分期及病理分级相同
的患者对治疗的反应和预后大不相同。 但是目前仍然是根据临床病理特点如HER2表达、雌激素受体状态、肿瘤大小、分级和淋巴结转移等选择辅助治疗,包括化疗,内分泌治疗,抗HER2治疗等。
高通量表达数据出来后分类方法非常多,与传统的TNM分期、临床病理指标相比,多基因预测系统能提供更准确的预后信息,并为选择治疗方案提供更加可靠的参考,是肿瘤精准治疗的重要突破方向。
最值得学习的是美国FDA批准的两多基因检测系统是Oncotype Dx 21基因检测和MammaPrint 70基因检测,当然其他科研工作着的尝试也值得回顾:
- three variants of the Single Sample Predictor (SSP) (SSP2003 [10], SSP2006 [11] and PAM50 [12])
- Subtype Classification Model (SCM) (SCMOD1 [7] and SCMOD2 [8]), and the simple three-gene model (SCMGENE [9])
有这么多分类方法,就有各种各样的比较方法文章。
目前看来,最出名的分类就是PAM50,其文章是:J Clin Oncol. 2009 Mar 10; 27(8): 1160–1167. 已经有了近3000
的引用,该研究使用了189 prototype samples
的芯片表达数据得到了一个50-gene subtype predictor
,得到的分类是:gene expression–based “intrinsic” subtypesluminal A, luminal B, HER2-enriched, and basal-like
.而且作者也验证了其分类器的效果:
Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen.
使用的基因表达芯片很小众:Agilent human 1Av2 microarrays or custom-designed Agilent human 22k arrays , 数据集上传了: GSE10886.PAM50相关的数据集
走我们的GEO教程(视频+代码 https://github.com/jmzeng1314/GEO ) 可以处理这个 GSE10886. 数据集。
rm(list = ls()) ## 魔幻操作,一键清空~ options(stringsAsFactors = F)#在调用as.data.frame的时,将stringsAsFactors设置为FALSE可以避免character类型自动转化为factor类型 # 注意查看下载文件的大小,检查数据 f='GSE10886_eSet.Rdata' # https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10886 library(GEOquery) # 这个包需要注意两个配置,一般来说自动化的配置是足够的。 #Setting options('download.file.method.GEOquery'='auto') #Setting options('GEOquery.inmemory.gpl'=FALSE) if(!file.exists(f)){ gset <- getGEO('GSE10886', destdir=".", AnnotGPL = F, ## 注释文件 getGPL = F) ## 平台文件 save(gset,file=f) ## 保存到本地 } load('GSE10886_eSet.Rdata') ## 载入数据 class(gset) #查看数据类型 length(gset) # gset class(gset[[1]])
The centroids, gene lists, and R code to produce the classification are all available along with the clinical information for the training set on this page: https://genome.unc.edu/pubsup/breastGEO/
Specifically, the R code and supporting data files are here: https://genome.unc.edu/pubsup/breastGEO/PAM50.zip
And the centroids alone are here: https://genome.unc.edu/pubsup/breastGEO/pam50_centroids.txt
In addition, this document provides additional information regarding classification of the PAM50 plus Claudin-low calls https://genome.unc.edu/pubsup/breastGEO/Guide%20to%20Intrinsic%20Subtyping%209-6-10.pdf
Anyone running PAM50 (or any classifier based on relative measurements such as expression) should understand the concepts in this paper: http://www.breast-cancer-research.com/content/pdf/s13058-015-0520-4.pdf
You can download PAM50 gene set, Sorlie500 gene set and Hu306 gene set from the sup data of this paper. Breast cancer molecular profiling with single sample predictors: a retrospective analysis.http://www.ncbi.nlm.nih.gov/pubmed/20181526 Or with the genefu Package from Bioconductorhttp://www.bioconductor.org/packages/2.12/bioc/manuals/genefu/man/genefu.pdf Hope this helps评价PAM50临床表现的文章
首先是:A Comparison of PAM50 Intrinsic Subtyping with Immunohistochemistry and Clinical Prognostic Factors in Tamoxifen-Treated Estrogen Receptor–Positive Breast Cancer 只针对 estrogen receptor (ER)–positive breast cancers 然后 clinical, immunohistochemical (IHC), PAM50的分类。
比如:BMC Medical Genomics 2012 https://doi.org/10.1186/1755-8794-5-44 比较了PAM50和IHC结果的一致性。标题是:PAM50 Breast Cancer Subtyping by RT-qPCR and Concordance with Standard Clinical Molecular Markers
还有:It has recently been proposed thata three-gene model (SCMGENE)
that measures ESR1,
ERBB2, and AURKA identifies the major breast cancer intrinsic subtypes and provides
robust discrimination for clinical use in a manner very similar to a 50-gene subtype predictorPAM50的分类可以作为机器学习预测指标
通常我们有了转录组表达量信息,就可以使用PAM50分类器来判断乳腺癌的亚型。但假设我们同时也有病人的其它指标,比如age, ER,PR, HER2 and Ki67 status等等,就可以使用机器学习模型来根据这些指标训练模型关联到其PAM50分类值。
这样如果我们有新的病人,虽然他们可能不会有转录组表达量信息,但是一般病人都会有age, ER,PR, HER2 and Ki67 status这样的指标值,就可以使用训练好的模型来预测其PAM50分类情况。
参考文献: - Determining breast cancer histological grade from RNA-sequencing data. Breast Cancer Res. 2016
- Assessment of breast cancer risk factors reveals subtype heterogeneity Cancer Res. 2017
从PAM50里面提取11个基因作为细胞增殖的代表
文章发表于2010,题目有点长 A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer.
作者用的是原始的Quantitative real-time reverse transcription-PCR (qRT-PCR) 有针对性的只测定 50个基因的表达量,选取了 786 个病人,他们可以根据PAM50进行分类,然后在补充材料里面,给出了11-gene proliferation signatures 和 8-gene luminal signature科普网站关于PAM50分类介绍
https://www.breastcancer.org/symptoms/types/molecular-subtypes
There are five main intrinsic or molecular subtypes of breast cancer that are based on the genes a cancer expresses: - Luminal A breast cancer is hormone-receptor positive (estrogen-receptor and/or progesterone-receptor positive), HER2 negative, and has low levels of the protein Ki-67, which helps control how fast cancer cells grow. Luminal A cancers are low-grade, tend to grow slowly and have the best prognosis.
- Luminal B breast cancer is hormone-receptor positive (estrogen-receptor and/or progesterone-receptor positive), and either HER2 positive or HER2 negative with high levels of Ki-67. Luminal B cancers generally grow slightly faster than luminal A cancers and their prognosis is slightly worse.
- Triple-negative/basal-like breast cancer is hormone-receptor negative (estrogen-receptor and progesterone-receptor negative) and HER2 negative. This type of cancer is more common in women with BRCA1 gene mutations. Researchers aren’t sure why, but this type of cancer also is more common among younger and African-American women.
- HER2-enriched breast cancer is hormone-receptor negative (estrogen-receptor and progesterone-receptor negative) and HER2 positive. HER2-enriched cancers tend to grow faster than luminal cancers and can have a worse prognosis, but they are often successfully treated with targeted therapies aimed at the HER2 protein, such as Herceptin (chemical name: trastuzumab), Perjeta (chemical name: pertuzumab), Tykerb (chemical name: lapatinib), and Kadcyla (chemical name: T-DM1 or ado-trastuzumab emtansine).
- Normal-like breast cancer is similar to luminal A disease: hormone-receptor positive (estrogen-receptor and/or progesterone-receptor positive), HER2 negative, and has low levels of the protein Ki-67, which helps control how fast cancer cells grow. Still, while normal-like breast cancer has a good prognosis, its prognosis is slightly worse than luminal A cancer’s prognosis.
关于MammaPrint
2002年,来自荷兰癌症研究院的研究者开发了一套乳腺癌多基因检测系统,其运用cDNA微阵列技术,检测78例淋巴结阴性、年龄<55岁、肿瘤直径<5cm的新鲜冰冻组织中的RNA表达情况。从25000个候选基因中,筛选出70个与细胞增殖、侵袭、转移、血管新生等相关的目标基因,组成MammaPrint检测系统。
在5年和10年远处复发风险基础上,根据基因表达与临床结果的相关性,将患者分为预后良好组及预后不良组。
已有多项研究证实了MammaPrint 70基因检测对早期乳腺癌患者预后的预测作用。来自欧洲多国的研究者开展了一项名为EORTC 10041/BIG 3-04 MINDACT的研究,进一步分析MammaPrint 70基因检测对辅助化疗决策的影响。2016,研究结果发表在The New England Journal of Medicine杂志。该研究纳入6693名早期乳腺癌患者。对冰冻肿瘤组织标本进行70基因检测,确定基因风险;应用Adjuvant! Online v8.0临床病理系统来确定临床风险。