功能:vcf格式突变数据进一步注释成maf格式
做过癌症数据分析的童鞋都知道,TCGA里面用maf格式来记录突变!那么maf格式的数据是如何得来的呢,我们都知道,做完snp-calling一般是得到vcf格式的突变记录数据文件,然后再用annovar或者其它蛋白结构功能影响预测软件注释一下,还远达不到maf的近100条记录。
而大名鼎鼎的broad institute就规定了maf格式的突变注释文件,他就是利用了十几个常见的已知数据库来注释我们得到的vcf突变记录,通常是对somatic的突变才注释成maf格式的数据!
大名鼎鼎的broadinstitute出品的突变注释工具:http://www.ncbi.nlm.nih.gov/pubmed/25703262
源码在github: https://github.com/broadinstitute/oncotator
本身也是一个在线工具:
input data数据指南:https://www.broadinstitute.org/oncotator/help/#inputformat
集成了下面所有的分析资源
而且还提供了API
Genomic Annotations
- Gene, transcript, and functional consequence annotations using GENCODE for hg19.
- Reference sequence around a variant.
- GC content around a variant.
- Human DNA Repair Gene annotations from Wood et al.
Protein Annotations
Cancer Variant Annotations
- Observed cancer mutation frequency annotations from COSMIC.
- Cancer gene and mutation annotations from the Cancer GenCensus.
- Overlapping mutations from the Cancer Cell Line Encyclopedia.
- Cancer gene annotations from the Familial Cancer Database.
- Cancer variant annotations from ClinVar.
Non-Cancer Variant Annotations
- Common SNP annotations from dbSNP.
- Variant annotations from 1000 Genomes.
- Variant annotations from NHLBI GO Exome Sequencing Project (ESP).
因为要下载的数据有点多,我这里就不用自己的电脑测试了,安装过程也很简单的!