文章作者通过分析 dbSNPv138 and ESP6500 数据库之后找到了 100个高频突变基因,然后跟其它几个数据库进行比较。文章题目是:FLAGS, frequently mutated genes in public exomes,https://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-014-0064-y
We used publicly available exome cohorts, together with the dbSNP database, to derive a list of genes (n = 100) that most frequently exhibit rare (<1%) non-synonymous/splice-site variants in general populations. We termed these genes FLAGS for FrequentLy mutAted GeneS and analyzed their properties.
Name of datasets | Size | Description |
---|---|---|
FLAGS | 100 | The top 100 of FrequentLy mutAted GeneS with rare (<1% allelic frequency) functional variants from dbSNPv138 and ESP6500 |
OMIM | 3099 | The list of protein-coding genes associated with human diseases from Online Mendelian Inheritance in Man [8] |
HGMD | 2691 | The list of protein-coding genes with damaging mutations (<1% allelic frequency) from Human Gene Mutation Database [28]. |
WES | 300 | Downloaded from Boycott et al. (2013) [7] - a list of novel genes implicated in human disorders based on whole exome sequencing studies, or novel/known pathogenic mutations discovered by whole-exome sequencing. |
Background | 18580 | The entire set of human protein-coding genes that have complete start and end translation annotations with a specified dN/dS ratio |
几个特性
Variants detected in FLAGS tend to be predicted as less deleterious
FLAGS tend to be reported in PubMed and associated with disease phenotypes
FLAGS have significantly longer coding lengths, higher average dN/dS ratios, and more paralogs than genes from OMIM and HGMD. FLAGS recently implicated in rare-Mendelian disorders.FLAGS are less likely to be disease-associated
作者还用 Tagxedo (http://www.tagxedo.com/) 工具做了一个词云。
其中LFAGS基因列表如下:
第一列是基因的NCBI规定的entrez ID,第二列是symbol,第三列是基因全名
154664 | ABCA13 | ATP-binding cassette, sub-family A (ABC1), member 13 |
---|---|---|
24 | ABCA4 | ATP-binding cassette, sub-family A (ABC1), member 4 |
10347 | ABCA7 | ATP-binding cassette, sub-family A (ABC1), member 7 |
79026 | AHNAK | AHNAK nucleoprotein |
113146 | AHNAK2 | AHNAK nucleoprotein 2 |
11214 | AKAP13 | A kinase (PRKA) anchor protein 13 |
7840 | ALMS1 | Alstrom syndrome protein 1 |
288 | ANK3 | ankyrin 3, node of Ranvier (ankyrin G) |
338 | APOB | apolipoprotein B |
259266 | ASPM | abnormal spindle microtubule assembly |
675 | BRCA2 | breast cancer 2, early onset |
8927 | BSN | bassoon presynaptic cytomatrix protein |
64072 | CDH23 | cadherin-related 23 |
9620 | CELSR1 | cadherin, EGF LAG seven-pass G-type receptor 1 |
202333 | CMYA5 | cardiomyopathy associated 5 |
1293 | COL6A3 | collagen, type VI, alpha 3 |
1294 | COL7A1 | collagen, type VII, alpha 1 |
64478 | CSMD1 | CUB and Sushi multiple domains 1 |
8029 | CUBN | cubilin (intrinsic factor-cobalamin receptor) |
25981 | DNAH1 | dynein, axonemal, heavy chain 1 |
196385 | DNAH10 | dynein, axonemal, heavy chain 10 |
8701 | DNAH11 | dynein, axonemal, heavy chain 11 |
8632 | DNAH17 | dynein, axonemal, heavy chain 17 |
146754 | DNAH2 | dynein, axonemal, heavy chain 2 |
55567 | DNAH3 | dynein, axonemal, heavy chain 3 |
1767 | DNAH5 | dynein, axonemal, heavy chain 5 |
56171 | DNAH7 | dynein, axonemal, heavy chain 7 |
1769 | DNAH8 | dynein, axonemal, heavy chain 8 |
1770 | DNAH9 | dynein, axonemal, heavy chain 9 |
667 | DST | dystonin |
79659 | DYNC2H1 | dynein, cytoplasmic 2, heavy chain 1 |
83481 | EPPK1 | epiplakin 1 |
2195 | FAT1 | FAT atypical cadherin 1 |
2196 | FAT2 | FAT atypical cadherin 2 |
120114 | FAT3 | FAT atypical cadherin 3 |
79633 | FAT4 | FAT atypical cadherin 4 |
84467 | FBN3 | fibrillin 3 |
8857 | FCGBP | Fc fragment of IgG binding protein |
2312 | FLG | filaggrin |
80144 | FRAS1 | Fraser extracellular matrix complex subunit 1 |
341640 | FREM2 | FRAS1 related extracellular matrix protein 2 |
NA | GPR98 | |
85441 | HELZ2 | helicase with zinc finger 2, transcriptional coactivator |
8924 | HERC2 | HECT and RLD domain containing E3 ubiquitin protein ligase 2 |
83872 | HMCN1 | hemicentin 1 |
388697 | HRNR | hornerin |
3339 | HSPG2 | heparan sulfate proteoglycan 2 |
58508 | KMT2C | lysine (K)-specific methyltransferase 2C |
8085 | KMT2D | lysine (K)-specific methyltransferase 2D |
284217 | LAMA1 | laminin, alpha 1 |
3908 | LAMA2 | laminin, alpha 2 |
3909 | LAMA3 | laminin, alpha 3 |
3911 | LAMA5 | laminin, alpha 5 |
4035 | LRP1 | low density lipoprotein receptor-related protein 1 |
53353 | LRP1B | low density lipoprotein receptor-related protein 1B |
4036 | LRP2 | low density lipoprotein receptor-related protein 2 |
23499 | MACF1 | microtubule-actin crosslinking factor 1 |
23195 | MDN1 | midasin AAA ATPase 1 |
4288 | MKI67 | marker of proliferation Ki-67 |
94025 | MUC16 | mucin 16, cell surface associated |
140453 | MUC17 | mucin 17, cell surface associated |
4583 | MUC2 | mucin 2, oligomeric mucus/gel-forming |
727897 | MUC5B | mucin 5B, oligomeric mucus/gel-forming |
51168 | MYO15A | myosin XVA |
9172 | MYOM2 | myomesin 2 |
4703 | NEB | nebulin |
84033 | OBSCN | obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF |
27445 | PCLO | piccolo presynaptic cytomatrix protein |
5116 | PCNT | pericentrin |
9659 | PDE4DIP | phosphodiesterase 4D interacting protein |
5310 | PKD1 | polycystic kidney disease 1 (autosomal dominant) |
168507 | PKD1L1 | polycystic kidney disease 1 like 1 |
5314 | PKHD1 | polycystic kidney and hepatic disease 1 (autosomal recessive) |
93035 | PKHD1L1 | polycystic kidney and hepatic disease 1 (autosomal recessive)-like 1 |
5339 | PLEC | plectin |
57674 | RNF213 | ring finger protein 213 |
94137 | RP1L1 | retinitis pigmentosa 1-like 1 |
6261 | RYR1 | ryanodine receptor 1 (skeletal) |
6263 | RYR3 | ryanodine receptor 3 |
26278 | SACS | sacsin molecular chaperone |
51332 | SPTBN5 | spectrin, beta, non-erythrocytic 5 |
23524 | SRRM2 | serine/arginine repetitive matrix 2 |
23166 | STAB1 | stabilin 1 |
55576 | STAB2 | stabilin 2 |
23345 | SYNE1 | spectrin repeat containing, nuclear envelope 1 |
23224 | SYNE2 | spectrin repeat containing, nuclear envelope 2 |
10579 | TACC2 | transforming, acidic coiled-coil containing protein 2 |
7011 | TEP1 | telomerase-associated protein 1 |
7038 | TG | thyroglobulin |
7273 | TTN | titin |
23352 | UBR4 | ubiquitin protein ligase E3 component n-recognin 4 |
7399 | USH2A | Usher syndrome 2A (autosomal recessive, mild) |
7402 | UTRN | utrophin |
157680 | VPS13B | vacuolar protein sorting 13 homolog B (yeast) |
54832 | VPS13C | vacuolar protein sorting 13 homolog C (S. cerevisiae) |
55187 | VPS13D | vacuolar protein sorting 13 homolog D (S. cerevisiae) |
7450 | VWF | von Willebrand factor |
129446 | XIRP2 | xin actin binding repeat containing 2 |
7455 | ZAN | zonadhesin (gene/pseudogene) |
463 | ZFHX3 | zinc finger homeobox 3 |