Daily Archives: 2015年8月31日
2014-4742samples-21tumors-Cancer5000-set-254-genes
2015-MADGiC-identify-cancer-driver-gene
2014-REVIEW-identifying driver mutation in sequenced cancer genome
2014-review-Next-generation sequencing to guide cancer therapy
This reductionist thinking led the initial theories on carcinogenesis to be centered on how many “hits” or genetic mutations were necessary for a tumor to develop.
文献笔记-2015-nature-molecular analysis of gastric cancer新的分类及预后调查
文献:Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes
A small pre-defined set of gene expression signatures
epithelial-to-mesenchymal transition (EMT) | 上皮细胞向间充质细胞转化 |
microsatellite instability (MSI) | 微卫星不稳定性 |
cytokine signaling | 细胞因子信号 |
cell proliferation | 细胞增殖 |
DNA methylation | DNA甲基化 |
TP53 activity | TP53活性 |
gastric tissue | 胃组织 |
经典的分类方法是:Gastric cancer may be subdivided into 3 distinct subtypes—proximal, diffuse, and distal gastric cancer—based on histopathologic and anatomic criteria. Each subtype is associated with unique epidemiology.
我们用主成分分析Principal component anaylsis (PCA)
PC1
PC2
PC3
这三个主成分与上面的七个特征是相关联的。
根据我们的主成分分析,可以把我们的300个GC样本分成如下四组,命名如下:
Gene expression signatures define four molecular subtypes of GC:
MSI (n = 68),
MSS/EMT (n = 46),
MSS/TP53+ (n = 79)
MSS/TP53− (n = 107)
然后用本文的分类方法,测试了另外另个published数据,还是分成四个组
(MSI, MSS/EMT, MSS/TP53+ and MSS/TP53-)
分别是TCGA数据库的;n = 46, n = 62, n = 50 and n = 47.
Singapore的研究; n = 12, n = 85, n = 39 and n = 63 respectively
我们这样的分组可以得到一些规律:
(i) The MSS/EMT subtype occurred at a significantly younger age (P = 3e-2) than did other subtypes. The majority (>80%) of the subjects in this subtype were diagnosed with diffuse-type (P < 1e-4) at stage III/IV(P = 1e-3).
(ii) The MSI subtype occurred predominantly in the antrum (75%), >60% of subjects had the intestinal subtype, and >50% of subjects were diagnosed at an early stage (I/II).
(iii) Epstein-Barr virus (EBV) infection occurred more frequently in the MSS/TP53+ group (n = 12/18, P = 2e-4) than in the other groups.
然后我们对我们的300个样本做了生存分析:
预后: MSI > MSS/TP53+ > MSS/TP53 > MSS/EMT
Next, we validated the survival trend of GC subtypes in three independent cohorts: Samsung Medical Center cohort 2 (SMC-2,n = 277, GSE26253)31,
Singapore cohort(n = 200, GSE15459)21 and
TCGA gastric cohort (n = 205).
We saw that the GC subtypes showed a significant association with overall survival
结论:我们这样的分类是最合理的,跟各个类别的预后非常相关。
然后我们看看突变模式:
the MSI~ hypermutation ~KRAS (23.3%), the PI3K-PTEN-mTOR pathway (42%), ALK (16.3%) and ARID1A (44.2%)18.
We observed enrichment of PIK3CA H1047R mutations in the MSI samples
we saw enrichment of E542K and E545K mutations in MSS tumors
The EMT subtype had a lower number of mutation events when compared to the other MSS groups(P = 1e−3).
The MSS/TP53− subtype showed the highest prevalence of TP53 mutations (60%), with a low frequency of other mutations
the MSS/TP53+ subtype showed a relatively higher prevalence (compared to MSS/TP53−) of mutations in APC, ARID1A, KRAS, PIK3CA and SMAD4.
再看看拷贝数变异情况:
再看看与另外两个研究团队的分类情况的比较
The TCGA study reported expression clusters (subtypes named C1–C4) and genomic subtypes (subtypes named EBV+, MSI, Genome Stable (GS) and Chromosomal Instability (CIN)).
A follow-up study of the Singapore cohort21 described three expression subtypes (Proliferative, Metabolic and Reactive)
However, a consensus on clinically relevant subtypes that encompasses molecular heterogeneity and that can be used in preclinical and clinical research has not been reported.
Here we report the molecular classification of GC linked not only to distinct patterns of genomic alterations, but also to recurrence pattern and prognosis across multiple GC cohorts.
microsatellite instability
英文简称 : MI
中文全称 : 微卫星不稳定性
所属分类 : 生物科学
词条简介 : 微卫星不稳定性(microsatellite instability,MI)检测是基于VNTR的发现,细胞内基因组含有大量的碱基重复序列,一般将6-7bp的串联重称为小卫星DNA(minisatellite DNA),又称为VNTR。而将1-4bp的串联重复称为微卫星DNA,又称简单重复序列(simple repeat sequence,SRS)。SRS是一种最常见的重复序列之一,具有丰富的多态性、高度杂合性、重组纺低等优点。最常见的为双核苷酸重复,即(AC)n和(TG)n。研究表胆,在n≥104时,2bp重复序列在人群中呈高度多态性。SRS广泛存在于原核和真核基因组中,约占真核基因组的5%,是近年来快速发展起来的新的DNA多态性标志之一。策卫星稳定性(MI)是指简重复序列的增加或丢失。MI首先在结肠癌中观察到,1993年在HNPCC中观察到多条染色体均有(AC)n重复序列的增加或毛失,以后相继在胃癌、胰腺癌、肺癌、膀胱癌、乳腺癌、前列腺癌及其他肿瘤等也好现存在微卫星不稳定现象,提示MI可能是肿瘤细胞的另一重要分子结果显示 ,MI与肿瘤与发展有关,MI仅在肿瘤细胞中发现,从未在正常组织中检测到。在原发与移肿瘤中,MI均交分布于整个肿瘤。晚期胃癌的MI频率显著高于早期胃癌。
文献笔记-2010-R-softeware-identify-cancer_driver_genes
我们用188 non-small cell lung tumors数据来测试了一个R语言程序,find driver genes in cancer ~
软件地址如下:http://linus.nci.nih.gov/Data/YounA/software.zip
这是一个R语言程序,里面有readme,用法很简单。
准备好两个文件,分别是silent_mutation_table.txt and nonsilent_mutation_table.txt ,它们都是普通文本格式数据,内容如下,就是把找到的snp格式化,根据注释结果分成silent和nonsilent即可。
#Ensembl_gene_id Chromosome Start_position Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Sample_Barcode
#ENSG00000122477 1 100390656 SNP G G A TCGA-23-1022-01A-02W-0488-09
然后直接运行程序包里面的主程序,在R语言里面source("main_R_script.r")
We reanalyzed sequence data for 623 candidate genes in 188 non-small cell lung tumors using the new method.
to identify genes that are frequently mutated and thereby are expected to have primary roles in thedevelopment of tumor
To find these driver genes, each gene is tested for whether its mutation rate is significantly higher than the background (or passenger) mutation rate.
Some investigators (Sjoblom et al., 2006) further divide mutations into several types according to the nucleotide and the neighboring nucleotides of the mutations.
Ding et al. (2008)的方法的三个缺点:
1、different types of mutations can have different impact on proteins.(越影响蛋白功能的突变,越有可能是driver mutation)
2、different samples have different background mutation rates. (在高突变背景的样本中的突变,很可能是高突变背景的原因,而不是因为癌症)
3、a different number of non-silent mutations can occur at each base pair according to the genetic code.(比如Tryptophan仅仅只有一个密码子,而arginine高达6个密码子)
我们提出的方法的4个优点是:
1,我们对非同义突变根据它们对蛋白功能的影响进行了评级打分。
2,我们允许不同的样品有着不同的BMR
3,that whether the mutation is non-silent or silent depends on the genetic code
4,we take into account uncertainties in the background mutation rate by using empirical Bayes methods
还有5个需要改进的地方:
1,However, the functional impact is also dependent on the position in which a mutation occurs.(我们仅仅考虑了突变对氨基酸的改变)
2,the current scoring system which assigns mutation scores in the order: missense mutation<inframe indel<mutation in splice sites<frameshift indel=nonsense mutation may be biased toward identifying tumor suppressor genes over oncogenes.
3,we may refine our background mutation model in Table 1 so that all six types of mutations, A:T→G:C, A:T→C:G, A: T→T :A,G:C→A:T, G:C→T :A, G:C→C:G have separate mutation rates.
4,we did not take into account correlations among mutations in identifying driver genes.
5,one might combine both copy number variation and sequencing data to identify driver genes.
HGNC定义的gene Symbol转为ensemble数据库的ID,的R语言代码:
library(biomaRt)
ensembl=useMart("ensembl",dataset = "hsapiens_gene_ensembl")
all.gene.table = read.table("all_gene.symbol", header=F)
convert=getBM(attributes = c("chromosome_name","ensembl_gene_id","hgnc_symbol"),filters =c("hgnc_symbol"),values=all.gene.table[,1],mart=ensembl)
chromosome=c(1:22,"X","Y","M")
convert=convert[!is.na(match(convert[,1],chromosome)),2:3] #remove names whose matching chromosome is not 1-22, X, or Y.
convert=convert[rowSums(convert=="")==0,]
write.table(convert,"ensembl2symbol.list",quote = F,row.names =F,col.names =F)
write.table(convert,"all_gene_name.txt",quote = F,row.names =F,col.names =F)
一个gene Symbol可能对应着多个ensemble ID号,但是在每个染色体上面是一对一的关系。
有些gene Symbol可能找不到ensemble ID号,一般情况是因为这个gene Symbol并不是纯粹的HGNC定义的,或者是比较陈旧的ID。
比如下面的TIGAR ,就很可能被写作是C12orf5
Aliases for TIGAR Gene
TP53 Induced Glycolysis Regulatory Phosphatase 2 3
TP53-Induced Glycolysis And Apoptosis Regulator 2 3 4
C12orf5 3 4 6
Probable Fructose-2,6-Bisphosphatase TIGAR 3
Fructose-2,6-Bisphosphate 2-Phosphatase 3
Chromosome 12 Open Reading Frame 5 2
Fructose-2,6-Bisphosphatase TIGAR 3
Transactivated By NS3TP2 Protein 3
EC 3.1.3.46 4
FR2BP 3
External Ids for TIGAR Gene
HGNC: 1185 Entrez Gene: 57103 Ensembl: ENSG00000078237 OMIM: 610775 UniProtKB: Q9NQ88
Previous HGNC Symbols for TIGAR Gene
C12orf5
Export aliases for TIGAR gene to outside databases