经典的癌症外显子数据如何分析

经典的癌症外显子数据如何分析

癌症研究方法 :http://kidsgenomics.org/genomic-landscape-pediatric-cancers/

2012年9月发表的14个中国NSCLC患者的WES研究

We performed exome sequencing of 14 non–small cell lung carcinomas (NSCLCs) with matched adjacent normal lung tissues extracted from Chinese patients.
https://academic.oup.com/carcin/article/33/9/1797/2463747
使用的芯片是: Agilent’s SureSelect Human All Exon kit, targeting 38Mb of sequence from 212 911 exons and their flanking regions in approximately 20 000 genes.
走标准分析流程,加上过滤后得到了 3321 high-confidence somatic mutations
| 分类 | 平均突变个数 | 百分比 |
| ——————————————- | ——— | —— |
| Average of total somatic SNVs | 237.2 | |
| Exonic | 115.6 | 48.7 |
| Splicing | 2.9 | 1.2 |
| ncRNA | 24.1 | 10.2 |
| 5′/3′-UTR | 6.1 | 2.6 |
| Intronic | 54.1 | 22.8 |
| Upstream/downstream | 1.9 | 0.8 |
| Intergenic | 32.4 | 13.7 |
挑选了106个位点做Sanger测序验证,文章的结论是,突变频率第二的基因 MXRA5 (on chromosome X) 是全新发现的癌症相关基因。

2017 卵巢癌

We performed whole exome sequencing of 39 OCCC samples with 16 matching blood tissue samples.
Four hundred twenty-six genes had recurrent somatic mutations. Among the 39 samples, ARID1A (62%) and PIK3CA (51%) were frequently mutated
DOI: https://doi.org/10.1016/j.ajpath.2017.06.012

2015年10月的42个CRC患者WES研究

西班牙研究团队发表于2015年10月的AACR上面的关于CRC的研究文章:http://clincancerres.aacrjournals.org/content/21/20/4709
包括42 adjacent tumor paired samples,鉴定到了 11,122 somatic SNVs ,The number of SNVs identified in coding regions and flanking sequences was 4,725. 样本平均 117个突变。 only 137 SNVs were shared by two or more tumors. Of those, 112 were intronic or intergenic.
用的WES芯片是:commercial kit Sure Select XT Human All Exon 50MB (Agilent).
Tumor exomes were sequenced at 60× coverage (2 × 75 bp reads), and exomes from adjacent tissues were sequenced at 40× (2 × 75 bp reads)
一般只考虑有意义的突变:Only the potentially functional SNVs, that is, coding nonsynonymous, stop gain, stop lost, splice-5′, splice-3′, coding synonymous near splice site, 3′-untranslated region (UTR), and 5′-UTR variants, were analyzed.
数据库注释可以全面一点“KEGG,” “Biocarta,” “Reactome,” and “GO”

2015年一月12 normal/tumor pairs of African American CRC

NIH的研究员于2015年一月发表于cancer杂志的关于CRC的研究
Identification of Novel Mutations by Exome Sequencing in African American Colorectal Cancer Patients
还有整合3个somatic mutation calling工具的文章;
A three-caller pipeline for variant analysis of cancer whole-exome sequencing data 发表于 2017年3月

2018-肝小胆管癌的肿瘤空间异质性

文章是:Spatial and temporal clonal evolution of intrahepatic cholangiocarcinoma. 发表于 影响因子12.486J Hepatol. 2018 Mar 15 ,分析了 69 spatially distinct regions from 6 operable ICCs.
测序深度很不错:WES achieved a mean coverage of 100X on those 69 low-passage PDPCs and matched peripheral blood samples, as well as 300X on the corresponding tumor tissues
数据分析得到的变异如下:We identified a total of 1,596 non-silent mutations, including 1,312 missenses, 85 nonsenses, 25 splice-site variants and 174 insertions or deletions in those 69 PDPCs.

2018-Hepatocellular-cholangiocarcinoma (H-ChC)

发表于: 影响因子12.124Nat Commun. 2018 Mar 1;9(1):894. doi: 10.1038/s41467-018-03276-y. 文章是:Whole-exome sequencing reveals the origin and evolution of hepato-cholangiocarcinoma.
选取的样本量比较奇怪:

  • Tumor samples from 15 H-ChC patients were selected for immunohistochemical (IHC) staining.
  • Tumor specimens from 32 HCC patients and 28 iCCA patients were used as controls.
    需要有这个疾病背景知识!
    在Peking Union Medical College Hospital招募了75个病人:
  • Fifteen patients were pathologically diagnosed with combined hepatocellular cholangiocarcinoma (H-ChC)
  • 32 were diagnosed with hepatocellular carcinoma (HCC),
  • 28 with intrahepatic cholangiocarcinoma (iCCA).
    测序数据在:EGAS00001002783
    使用了4个数据库: Cancer Gene Census (CGC513),Bert Vogelstein125, SMG127, and Comprehensive 435 database 来试图寻找驱动突变

    小鼠的三阴性乳腺癌模型的WES

    发表于:20.011Cancer Discov. 2018 Mar;8(3):354-369. doi: 10.1158/2159-8290.CD-17-0679. Epub 2017 Dec 4. 文章是: Identifying and Targeting Sporadic Oncogenic Genetic Aberrations in Mouse Models of Triple-Negative Breast Cancer.

    cfDNA的外显子测序

    发表于:9.619Clin Cancer Res. 2018 Feb 15;24(4):939-949. doi: 10.1158/1078-0432.CCR-17-1586. Epub 2017 Nov 30. 文章是: Whole-Exome Sequencing of Cell-Free DNA Reveals Temporo-spatial Heterogeneity and Identifies Treatment-Resistant Clones in Neuroblastoma.
    数据在: EGAS00001002705

    比较WES的CNV与其它方法

    发表于: 9.619Clin Cancer Res. 2017 Oct 15;23(20):6070-6077. doi: 10.1158/1078-0432.CCR-17-0972. Epub 2017 Jul 27. 文章是: Gene Copy Number Estimation from Targeted Next-Generation Sequencing of Prostate Cancer Biopsies: Analytic Validation and Clinical Qualification.
    We evaluated CNVkit for CNA identification from amplicon-based targeted NGS in a cohort of 110 fresh castration-resistant prostate cancer biopsies and used capture-based whole-exome sequencing (WES), array comparative genomic hybridization (aCGH), and FISH to explore the viability of this approach.
    还有更多文献

    2017-HCC的cfDNA外显子测序

    发表于:12.486J Hepatol. 2017 Aug;67(2):293-301. doi: 10.1016/j.jhep.2017.03.005. Epub 2017 Mar 18. 文章是: Circumventing intratumoral heterogeneity to identify potential therapeutic targets in hepatocellular carcinoma.
    测序策略是:Whole exome sequencing (WES) and targeted deep sequencing (TDS) were carried out in 32 multi-regional tumor samples from five patients. Matched preoperative cfDNAs were sequenced accordingly.
    测序深度还行: WES was conducted at a mean depth of 211.3× (153.1× to 255.3×) using adjacent liver tissue as normal control.
    最后得到的somatic变异如下: A total of 1,220 high confidence (HC) somatic SNVs (mutated allele frequency (MAF) ⩾5% in tumor and ⩽0.5% in normal control) involving 581 genes were identified, including 789 non-synonymous and 252 synonymous SNVs, and 179 SNVs in untranslated regions
    数据在;EGAS00001002207

    2017-分析germline突变

    发表于:Gastroenterology18.392. 2017 Apr;152(5):983-986.e6. doi: 10.1053/j.gastro.2016.12.010. Epub 2016 Dec 23., 文章是: Germline Mutations in PALB2, BRCA1, and RAD51C, Which Regulate DNA Recombination Repair, in Patients With Gastric Cancer.
    We identified 11 cases with mutations in PALB2, BRCA1, or RAD51C genes, which regulate homologous DNA recombination. We found these mutations in 2 of 31 patients with HDGC (6.5%) and 9 of 331 patients with sporadic gastric cancer (2.8%).

    多组学肺癌

    发表于:11.862PLoS Med. 2016 Dec 6;13(12):e1002162. doi: 10.1371/journal.pmed.1002162. eCollection 2016 Dec. 文章是:Somatic Genomics and Clinical Features of Lung Adenocarcinoma: A Retrospective Study.
    We performed an integrative genomic analysis, incorporating whole exome sequencing (WES), determination of DNA copy number and DNA methylation, and transcriptome sequencing for 101 LUAD samples from the Environment And Genetics in Lung cancer Etiology (EAGLE) study

    早期肺腺癌的WES加上详细的临床随访

    发表在:Ann Oncol11.855. 2017 Jan 1;28(1):75-82. doi: 10.1093/annonc/mdw436. 文章是:Whole-exome sequencing and immune profiling of early-stage lung adenocarcinoma with fully annotated clinical follow-up.

    使用WES和捕获测序探索男性乳腺癌的风险基因

    发表于:Cancer6.072. 2017 Jan 1;123(2):210-218. doi: 10.1002/cncr.30337. Epub 2016 Sep 20.文章是:Whole-exome sequencing and targeted gene sequencing provide insights into the role of PALB2 as a male breast cancer susceptibility gene.

    肿瘤外显子测序数据找CNV的软件

    文章是;Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2.

    稀有癌症的

    发表于:Cancer Res9.122. 2016 Aug 15;76(16):4720-4727. doi: 10.1158/0008-5472.CAN-15-3134. Epub 2016 Jun 20. 文章是:CSN1 Somatic Mutations in Penile Squamous Cell Carcinoma.
    纳入了27个病人:WES was performed on penile cancer and matched germline DNA from 27 patients undergoing surgical resection.
    数据结果是:Analysis revealed 810 genes containing somatic mutations among the 27 tumors (24 tumor germline pairs, 3 single tumors), with a mean somatic mutation rate of 30 per tumor.

    痣相关癌症

    发表于;Am J Hum Genet9.025. 2016 May 5;98(5):1030-1037. doi: 10.1016/j.ajhg.2016.03.019. 文章是:Somatic Mutations in NEK9 Cause Nevus Comedonicus.
    只纳入了3个病人的配对WES, 测序数据量一般:generating average coverage > 90× for tissue samples and > 85× for blood samples

    2016 NPC的易感基因

    发表于:Proc Natl Acad Sci U S A9.661. 2016 Mar 22;113(12):3317-22. doi: 10.1073/pnas.1523436113. Epub 2016 Mar 7. 文章是:Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma.
    数据量很大:To identify genetic susceptibility genes for NPC, a whole-exome sequencing (WES) study was performed in 161 NPC cases and 895 controls of Southern Chinese descent.
    还有更大的验证人群:The validation study, including 2,160 cases and 2,433 controls, 如果仅仅是上面的肿瘤外显子测序,其实是很难分析肿瘤易感基因的,只有大规模的人群验证。
    对161个NPC病人取血样,测序量是49-fold on target (range of 32- to 76-fold) ,病人分类是:

  • 9 EAO cases
  • 63 FH+cases from 52 independent families
  • 59 sporadic cases
    早期的39个样本的数据公布了: SRA291701).

    2016香港大学的NPC的WES研究

    香港大学的科研团队,发表于2016年10月的PANS;https://doi.org/10.1073/pnas.1607606113 作者也公布了其数据: SRA288429 and SRA291304 .
    作者首先针对NPC患者测了WES和RNA-seq,然后根据somatic的SNVs情况和基因比对情况确定了 364个基因的panel进行捕获测序。

  • We performed WES with 51 primary tumors and 8 recurrent tumors, 3 of which had matching lymph node metastatic tumors, and used targeted resequencing for an additional 73 primary tumors (SI Appendix, Table S1).
  • Paired blood samples were also sequenced as references. RNA sequencing (RNASeq) was performed in 10 tumor pairs with adequate quality and quantity of RNA. The overall workflow is shown in SI Appendix, Fig. S1A.
  • After removing duplicates, the mean target coverages of tumor and blood samples were 70× and 49× in WES (12% average duplication rate) and 190× and 68× in targeted resequencing (9% average duplication rate), respectively. For tumor samples in WES, 72% of bases were covered at least 30×, and 50% of bases were covered at least 50× (SI Appendix, Fig. S1B and Table S2).
  • Overall, 1,374 nonsilent somatic mutations that change the protein sequences or involve splice sites in 1,242 genes were identified across 51 primary tumors, and 457 nonsilent somatic mutations in 438 genes were identified in recurrent and metastatic tumors by WES (Fig. 1A and SI Appendix, Tables S3 and S4).
  • Subsequently, in total, 186 nonsilent somatic mutations in 123 genes were identified across 73 primary tumors in targeted resequencing (SI Appendix, Table S5).
  • In our study, the median mutation rate of NPC is 0.9 somatic mutations per megabase in coding regions (SI Appendix, Fig. S2). Somatic mutations were verified by Sanger sequencing or RNASeq, and a verification rate of 95% was achieved (SI Appendix, Table S6).
    最重要的就是找driver mutation啦,利用了2014年的数据一起走了 MutSigCV 流程。
  1. Lin DC, et al. (2014) The genomic landscape of nasopharyngeal carcinoma. Nat Genet 46(8):866–871.

    2013 blastic plasmacytoid dendritic cell neoplasm.

    Blastic plasmacytoid dendritic cell neoplasm (BPDCN)
    whole-exome sequencing (WES) of three BPDCN cases (37-99 deleterious gene mutations )
    Target sequencing for 38 selected genes in 25 BPDCN samples
    不过正常人的WGS数据分析流程也值得学习

    2017 测24个人的WGS如何分析

    主要是靠分析技巧:Published online 2017 Dec 12. doi: 10.1038/s41467-017-00663-9
    一个project:Southern African Human Genome Programme 其实不过是给24个人做了(8 Coloured and 16 black southeastern Bantu-speakers) 50X的WGS测序而已,对找到的 16.3 million 个SNVs做了后续过滤剩下1703 个。值得一提的是:Each sample was also genotyped using the IlluminaOmni2.5 genotyping array.
    数据被上传到了一个不容易下载的地方:https://www.ebi.ac.uk/ega/datasets/EGAD00010001418

    2016-TCGA-胰腺癌

    TCGA计划共测序了456个胰腺导管腺癌患者,样本量大不可避免的就是工作量大。
    该工作对这些样本进行了全基因组重测序,外显子测序,转录组测序,CNV芯片,表达芯片,甲基化芯片等,标准的TCGA计划。
    分析涵盖体细胞突变鉴定,突变签名分析,显著突变基因分析,表达量分析,共表达分析,基因富集分析,拷贝数变异分析,重排变异检测,甲基化分析,生存分析等等。
    主要发现是将胰腺癌根据突变背景分为四个亚型。并鉴定了32个胰腺癌中的高频突变基因和十个信号通路。

    2017 胰腺鳞癌和胰腺腺癌的比较

    上海交通大学瑞金医院的研究员于2017年6月在Journal of Pathology发文,对17个胰腺腺鳞癌(PASC)和34个胰腺导管腺癌(PDAC)患者进行了全基因组和全外显子组测序和分析。
    这是第一次对胰腺腺鳞癌(样本获取不易)的基因突变情况进行统计。

  • 通过比较发现KRAS, TP53SMAD4是PASC和PDAC中的高频突变基因,但是PASC中的TP53突变频率要比PDAC更高。
  • 在CNV层面,gains of 3q, 8q (MYC), 12p (KRAS), chromosome 19和losses of 3p (FHIT), 4p, 6q, 8p, 9p (CDKN2A), 17p (TP53), 18q (SMAD4)是常见的高频CNV突变,但是3p loss在PASC中要比PDAC中多。
    文章的重要意义就是第一次对胰腺腺鳞癌的突变情况做了分析,而且与常见的胰腺导管腺癌进行了比较。

    2017 胰腺导管腺癌的非编码区突变分析

    冷泉港实验室Feigin 等人于2017年发表了一篇对胰腺导管腺癌的非编码区突变分析的文章。这篇文章对于基因组重测序的分析又开启了新的篇章。虽然之前有文章探究了癌症非编码区的突变情况,但大多数还是以编码区为主。毕竟,编码区的突变更有意义,更能解释和说明问题,也方便后续的验证,反映的问题更直接。而非编码区的突变主要起了调控作用,并不直接影响蛋白结构和功能。但是基因组绝大部分区域还是非编码区,所以亟待有一套研究方法应用于非编码区,这篇文章应运而生。这篇文章提供的方法叫GECCO。
    主要思路是突变数据结合表达数据,首先对突变进行优化和筛选,而后筛选影响表达的突变并做统计学检验。其次是对一个突变簇的突变率进行计算,并计算一个表达调整分数。最后对找到的突变和基因进行通路分析和预后分析。值得一提的是,这篇文章的数据都是从ICGC下载的,并没有任何新增数据。文章最后发现的胰腺导管腺癌的高频非编码区突变主要富集在axon guidance 和 cell adhesion信号通路,以及一些新的鉴定到的基因。

Comments are closed.