somatic mutations 含义很广,包括:SNVs,Indel,CNAs,SVs等
However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations.
Cancer is driven largely by somatic mutations that accumulate in the genome over an individual’s lifetime, with additional contributions from epigenetic and transcriptomic alterations
低通量时代研究,成功例子: imatinib has been used to target cells expressing the BCR-ABL fusion gene in chronic myeloid leukemia
gefitinib has been used to inhibit the epidermal growth factor receptor in lung cancer
但远远不够。
NGS的三大挑战:1,indentying somatic mutations,误差/ 肿瘤异质性 2,识别driver genes 3,确定由somatic mutations 改变的pathways和其它生物过程
误差来源:optical PCR duplicates, GC-bias, strand bias (where reads indicating a possible mutation only align to one strand of DNA) and alignment artifacts resulting from low complexity or repetitive regions in the genome.
most methods for somatic mutation detection address only a subset of the possible sources of error,call snp的软件众多
identifying driver mutations的三个要点:
1,identifying recurrent mutations;
2,predicting the functional impact of individual mutations;
3,assessing combinations of mutations using pathways, interaction networks, or statistical correlations.
三个要点分别衍生了大量的软件,它们的问题在于:
1,直接看突变频率的那些软件to determine whether the observed number of mutations in the gene is significantly greater than the number expected according to a background mutation rate (BMR).
BMR 实在是太难确定了,低了会导致很多假阳性,而高了,又错过很多真实的driver mutations,但是突变频率非常高的那些基因肯定是没有问题的,比如说TP53,无论什么样的算法都会认为它是driver gene
2,考虑突变对蛋白功能的影响评分的那些软件,引入了一些先验假设:
evolutionary conservation,
known protein domains,
non-random clustering of mutations,
protein structure,
3,pathways, interaction networks, and de novo approaches的那些软件:
pathway(KEGG,GO,GSEA) 4个limitations,首先,大多数 annotated gene sets 包含的基因数太多,而我们的突变基因占该gene set的比例远达不到统计显著性。
然后,pathway并不是独立的,各个pathway之间的联系更重要
接着,把基因分割成pathway这样的小单元,忽略了单元外的联系
最后,只关注已知的 pathways, or gene set
过去的五年见证了癌症基因组测序研究翻天覆地的变化,但是距离它真正的临床应用还有以下几个挑战:
首先,我们忽略了non-coding somatic mutations
其次,很多我们定义的癌症种类其实是a mixture of these subtypes
然后,哪些癌症是可以合并研究的
最后,不同的NGS数据如何综合研究,包括WGS,WES,RNA sequencing, DNA methylation, and chromatin modifications
对某些患者来说,癌症精准医学已经来临,但是对大部分病人来说,前面的路还很长。