十 16

生物信息学学者学习mysql之路

Posted on 2015年10月16日 by ulwvfje

我一直都知道mysql其实很有用的，哪怕是在bioinformatics领域。也断断续续的看过不少mysql教程，只是苦于没有机会应用。毕竟应用才是最好的学习方法，正好这些天需要用了，我就又梳理了一遍作为一个生物信息学学者，该如何学习mysql数据库。

先看中文教程：http://www.cnblogs.com/mr-wid/archive/2013/05/09/3068229.html

然后再搜搜一堆技巧

https://dev.mysql.com/doc/refman/5.1/en/counting-rows.html

http://www.w3schools.com/sql/sql_func_count.asp

https://dev.mysql.com/doc/refman/5.0/en/pattern-matching.html

http://hahaxiao.techweb.com.cn/archives/477.html

差不多就可以开始啦。

我们不拿数据库来做网页，所以需要的仅仅是查询公共数据库的数据，当然，一般人都会选择直接去网页可视化的查询，或者去ftp批量下载后自己写脚本来查询，我以前也是这样想的，所以感觉mysql没什么用，因为它能做的，我写一个脚本都能做到。但是任何事物能发展到如此流行的程度毕竟还是有它的优点的。

而在我看来，mysql的优点就是，不需要存储大量的文件信息，随查随用，如果我们想把数据库备份到本地，就要建立一大堆的文件夹，存放各种refgene信息呀，entrez gene信息呀，转录本，外显子等等各个文件夹，每个文件夹下面还有一堆文件，而且还要分物种存储，总之就是很麻烦，但是在数据库就不一样啦。

比如我们可以连接UCSC的数据库（前提是你的机器里面可以允许mysql这个命令，而且你可以联网）

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A

就这么简单，你就用mysql远程登录了UCSC的数据库，可以show databases;或者use database hg19 ; 等等

里面有两百多个数据库，主要是多物种多版本，然后如果我们看hg19这个数据库，里面还有一万多个数据表，包含着hg19的全面信息。

还有很多其它的公共数据库可以练习
来自于：https://www.biostars.org/p/474/#9095

for example, I would cite:

UCSC http://genome.ucsc.edu/FAQ/FAQdownloads#download29
ENSEMBL http://uswest.ensembl.org/info/data/mysql.html
GO http://www.geneontology.org/GO.database.shtml#mirrors

1000 Genomes: since June 16, 2011: http://www.1000genomes.org/public-ensembl-mysql-instance

mysql -h mysql-db.1000genomes.org -u anonymous -P 4272

Flybase has direct access to its postgres chado database.
http://flybase.org/forums/viewtopic.php?f=14&t=114
hostname: flybase.org port: 5432 username: flybase password: no password database name: flybase
e.g. psql -h flybase.org -U flybase flybase

mysql -h database.nencki-genomics.org -u public
mysql -h useastdb.ensembl.org -u anonymous -P 5306

你都可以登录进去看看里面有什么，也可以练习练习mysql的语法，但是增删改查种的查是可以用的

然后我们可以用R或者perl或者Python来连接数据库，也是蛮好用的，我现在比较倾向于R

所以我就简单看了一下这个包的说明书，然后成功连接了

#Connect to the MySQL server using the command:

#mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A

#The -A flag is optional but is recommended for speed

library(RMySQL)

my.host="genome-mysql.cse.ucsc.edu";

my.port="";

my.user="genome";

my.password="";

my.db="hg19";

#there are 203 databases,such as hg18,hg38,mm9,mm10,ce10

con <- dbConnect(MySQL(), host=my.host, user=my.user,dbname=my.db)

dbListTables(con) # there are 11016 tables in this hg19 database;

是不是很简单呀，只有你认真的学习，其实这些应用的东西都还是蛮简单的。

下面这本书也比较好，就讲了R或者perl或者Python来连接数据库，很全面

http://bioinformatics.risha.me/category/mysql/

当然，如果想看mysql在bioinformatics方面的应用，下面还有很多学习资料

http://www.biomedcentral.com/1471-2105/11/342

http://bioinformatics.oxfordjournals.org/content/28/14/1947.full.pdf

https://rostlab.org/owiki/images/7/73/Protocol_goldberg.pdf

http://webdoc.nyumc.org/nyumc/files/sun-lab/attachments/CPBI.Ch9.Biol.DB.pdf

http://www.bsi.umn.edu/resources/perl3.pdf

http://www.cs.toronto.edu/~leijiang/ta/mie453/tutorial/tut5/

这个课程比较全面：Biological Databases in Bioinformatics (BioE 594)

http://bioinformatics.bioe.uic.edu/online/BioE594_db.shtml

进阶版还可以看看具体事例，GO数据库的设计：http://geneontology.org/page/lead-database-schema

从这个来看，python要比perl 好很多http://www.personal.psu.edu/iua1/courses/files/2010/week15.pdf

十 16

居然还可以出售TCGA的数据，只有你稍微进行分析一下即可

Posted on 2015年10月16日 by ulwvfje

亮瞎了我的双眼，原来还可以这样挣钱。

这个数据库的作者在2011年发了一篇如何寻找融合基因的文章：*Edgren, Henrik, et al. "Identification of fusion genes in breast cancer by paired-end RNA-sequencing." Genome Biol 12.1 (2011): R6.

然后基于此，把TCGA计划里面的所有癌症样本数据都处理了，并且得到了融合基因数据集，然后就以此出售

http://medisapiens.com/products/fusion-scout/fusionscout-cancer-datasets （网站好像需要翻墙才能打开）

价格高达一万欧元，折合人民币七万多，一本万利，而且人家TCGA计划的数据的公开而且免费的，他做了二次处理就可以拿来挣钱，让我感觉很不爽。

到目前为止他们处理了TCGA计划里面的7652个癌症样本的数据，建立了一个囊括28种癌症的融合基因数据集，并且打包成了一个叫做FusionSCOUT 的产品来出售。

价格如下：

Pricing of FusionSCOUT datasets:

Single gene in one cancer set 490€ / 580$ per dataset
Single gene fusions across all cancers 4900€ / 5800$ dataset
Individual cancer set 990 € / 1250 $ per dataset
Full TCGA dataset 9900€ / 12500$ per dataset

该网站是这样介绍他们的产品的，号称有3500个研究团体已经使用了他们的数据，但是我感觉纯粹是吹牛，毕竟他这篇文献也就一百多的引用量，再说3500次购买，就这一个产品就能让他成为亿万富翁了，想想都觉得可怕。而且这网站这么烂，中国访问速度是渣渣，也就是相当于失去了中国的所有土豪客户了，怎么可能还有3500的销量，搞笑！

One of the latest therapeutics angles in the fight against cancer is fusion genes and their regulation. To aid in fusion gene research and reveal the multitude of gene fusion event in cancer samples MediSapiens has developed a proprietary FusionSCOUT pipeline for identifying fusion genes from RNA sequencing datasets.

Currently we have analysed 7625 tumour samples from the TCGA project building a fusion gene dataset covering 28 different cancers within the TCGA project which can be accessed through our FusionSCOUT product.

Using this pipeline, we have discovered 3930 samples with gene fusions with 9667 different fusion genes. We´ve discovered numerous novel gene fusions as well as new cancer types in which previously known fusions appear.

You can now purchase these gene fusions datasets with few mouse clicks and get the worlds most comprehensive gene fusions from cancer sets within days

FusionSCOUT cancer Reports

With FusionSCOUT you can access the full listings of all fusion genes in specific cancer datasets. Find new leads for possible cause of the cancer, examine the pathways that are affected by different fusions, stratify patients by shared fusion genes or search for potential target for drugs and companion diagnostics.

Once you purchase a FusionSCOUT dataset we will send you a detailed report with information on the fused genes, sample ID from the TCGA dataset, fusion frequencies across the dataset as well as fusion mRNA sequences and lists of protein domains present in the fusion transcripts.

By ordering the MediSapiens FusionSCOUT dataset, you´ll get:

A list of all gene fusions that involve your gene of interest, across all TCGA cancer types
TCGA sample ID: s of the for the samples with fusions
Exact exon junctions for the fusions, including alternatively spliced variants and data on whether reading frame is retained
Detailed list of protein domains retained in the fusion genes
cDNA sequence for the fusion mRNAs

Contact us to access the most up-to-date and comprehensive datasets of fusion gene events in different cancers!contact@medisapiens.com

Check out also our Fusion Gene Detection pipeline service for your samples!

Dataset missing? Email us and well add your favorite dataset to FusionSCOUT!

FusionSCOUT Cancer sets, March 2015

Cancer type	Number of samples	Number of fusion genes
Acute Myeloid Leukemia, LAML	153	69
Adrenocortical carcinoma, ACC	79	115
Bladder Urothelial Carcinoma, BLCA	273	473
Brain Lower Grade Glioma, LGG	467	309
Breast Invasive Carcinoma, BRCA	1029	3267
Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma, CESC	195	190
Colon Adenocarcinoma, COAD	287	212
Glioblastoma multiforme, GBM	170	379
Head and Neck Squamous Cell Carcinoma, HNSC	412	386
Kidney Chromophobe, KICH	66	19
Kidney Renal Clear Cell Carcinoma, KIRC	523	217
Kidney Renal Papillary Cell Carcinoma, KIRP	226	145
Liver Hepatocellular Carcinoma, LIHC	198	317
Lung Adenocarcinoma, LUAD	456	991
Lung Squamous Cell Carcinoma, LUSC	482	1374
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma, DLBC	28	18
Mesothelioma, MESO	36	26
Ovarian Serous Cystadenocarcinoma, OV	420	1166
Pancreatic Adenocarcinoma, PAAD	84	46
Pheochromocytoma and Paraganglioma, PCPG	184	83
Prostate Adenocarcinoma, PRAD	336	859
Rectum Adenocarcinoma, READ	85	74
Sarcoma, SARC	161	799
Skin Cutaneous Melanoma, SKCM	355	620
Stomach Adenocarcinoma, STAD	190	311
Thyroid Carcinoma, THCA	506	195
Uterine Carcinosarcoma, UCS	57	229
Uterine Corpus Endometrial Carcinoma, UCEC	167	422

十 14

几个国外出名的跟生物信息学相关的会议

Posted on 2015年10月14日 by ulwvfje

会议列表如下：

ASGH会议-Annual Meeting of the American Society of Human Genetics

AGBT会议-Advances in Genome Biology & Technology (AGBT)

ASM会议-annual meeting of the American Society for Microbiology

ASHI会议-The American Society for Histocompatibility and Immunogenetics

BOSC-生物信息开放会议：Bioinformatics Open Source Conference

ISMB/ECCB会议

ACMG会议-The ACMG Annual Clinical Genetics Meeting

annual Biology of Genomes (BoG) meeting at Cold Spring Harbor

以上排名不分先后，

一年一度的美国人类遗传学协会（ASHG）年会是遗传学界的盛事，也是目前规模最大的人类遗传学会议。2015年的年会于10月6-10日在马里兰州的巴尔的摩举行，吸引了6500多名科学家参与。他们将在会议上介绍和讨论人类遗传学各个方面的最新进展。

会议官网是：http://www.ashg.org/

非常隆重，也受到业界追捧！

会议ppt均可下载，但是要翻墙

https://storify.com/andrewsu/ashg14-speaker-slides

http://erlichlab.wi.mit.edu/ashg2014/

基因组生物学技术进展大会（AGBT）

中文介绍：

www.biodiscover.com/news/politics/117417.html www.biodiscover.com/news/politics/117417.html

http://www.lifeomics.com/?p=23197

The 23rd Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2015)

14th Annual European Conference on Computational Biology (ECCB 2015)

这也是一个老牌会议了，会议官网是：https://www.iscb.org/

2015年的会议资料可以直接下载了：

https://www.iscb.org/images/stories/ismbeccb2015/downloads/ISMBECCB15-Program-web.pdf

ASM会议就比较水一点：

会议官网是：http://www.asm.org/

WHO WE ARE

The American Society for Microbiology (ASM) is the oldest and largest single life science membership organization in the world. Membership has grown from 59 scientists in 1899 to more than 39,000 members today, with more than one third located outside the United States. The members represent all aspects of the microbial sciences including microbiology educators.

The mission of ASM is to promote and advance the microbial sciences.

ASM accomplishes this mission through a variety of products, services and activities.

We provide a platform for sharing the latest scientific discoveries through our books, journals, meetings and conferences.
We help strengthen sustainable health systems around the world though our laboratory capacity building and global engagement programs.
We advance careers through our professional development programs and certifications.
We train and inspire the next generation of scientists through our outreach and educational programs.

ASM members have a passion for the microbial sciences, a desire to connect with their colleagues and a drive to be involved with the profession. Whether it is publishing in an ASM Journal, attending an ASM meeting or volunteering on one of the Society's many boards and committees.

Big parts of our everyday lives, from energy production, waste recycling, new sources of food, new drug development and infectious diseases to environmental problems and industrial processes-are studied in the microbial sciences.

Microbiology boasts some of the most illustrious names in the history of science--Pasteur, Koch, Fleming, Leeuwenhoek, Lister, Jenner and Salk--and some of the greatest achievements for mankind. Within the 20th century, a third of all Nobel Prizes in Physiology or Medicine have been awarded to microbiologists.

ASHI主要是免疫学相关的：

官网是：http://www.ashi-hla.org/

The ASHI 41st Annual Meeting site is now live, for the latest updates visit 2015.ashi-hla.org.

About ASHI

The American Society for Histocompatibility and Immunogenetics (ASHI) is a not-for-profit association of clinical and research professionals including immunologists, geneticists, molecular biologists, transplant physicians and surgeons, pathologists and technologists. As a professional society involved in histocompatibility, immunogenetics and transplantation, ASHI is dedicated to advancing the science and application of histocompatibility and immunogenetics; providing a forum for the exchange of information; and advocating the highest standards of laboratory testing in the interest of optimal patient care.

BOSC-生物信息开放会议

在wiki里面有详细的介绍：

The Bioinformatics Open Source Conference (BOSC) is an academic conference on open source programming in bioinformatics organised by the Open Bioinformatics Foundation. The conference has been held annually since 2000 and is run as a two-day satellite meeting preceding the Intelligent Systems for Molecular Biology (ISMB) conference.

annual Biology of Genomes (BoG) meeting

会议的官网是：http://meetings.cshl.edu

冷泉港实验室是一个很牛叉的实验室，举办的会议不计其数。

http://meetings.cshl.edu/meetings.aspx?meet=genome&year=15

顺便推荐一个博客：http://robpatro.com/blog/?p=248

https://liorpachter.wordpress.com/2015/05/10/near-optimal-rna-seq-quantification-with-kallisto/

http://www.homolog.us/blogs/blog/2015/07/10/will-i-use-kallisto-definitely-most-likely-and-never/

http://blog.genohub.com/biology-of-genomes-meeting-2013/

ACMG会议是临床相关的，报道的比较少

会议官网是；http://www.acmgmeeting.net/

ABOUT

The ACMG Annual Clinical Genetics Meeting provides genetics professionals with the opportunity to learn how genetics and genomics are being integrated into medical or clinical practice. The ACMG Annual Meeting Program Committee has developed a high caliber scientific program that will present the latest developments and research in clinical genetics and genomics

十 09

对vcf突变数据与公开发表的进行比对

Posted on 2015年10月9日 by ulwvfje

当我们对NGS数据call了snp之后一般会输出成vcf格式的数据，一行代表一个突变，例如

20 2451451 . G T 1939.77 .

AC=1;AF=0.500;AN=2;BaseQRankSum=-10.134;DP=239;Dels=0.00;FS=2.276;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.258;QD=8.12;ReadPosRankSum=0.823;SOR=0.870

GT:AD:DP:GQ:PL 0/1:150,89:239:99:1968,0,3874

#前几列记录着该突变发生在第几号染色体以及该染色体的哪个坐标，我们的参考基因组在该位点是什么碱基，我们测到的突变成了什么碱基。

最后两列是测序深度以及正负测序深度，或者ref和allele的测序深度。

只有第8列是最复杂的，可以有高达几百个数据信息，取决于我们用什么样的软件来call的snp，以及call了snp之后用什么样的软件做的注释。

接下来我们还需要探究我们找到的突变是否在其它以及公开发表的数据库里面被找到过，所以可以下载非常多的公共数据库进行比对，我所见过的有一下一些，估计完全下载有0.5T

dbsnp144 （这个是ncbi提供的最权威的啦）

cgi69

ExAC.vcf.gz（这个是broadinstitute提供的外显子联盟）

Cosmic_v73.ann.vcf.gz （这个是癌症突变信息集）

finalTCGA.vcf.gz （TCGA计划也是癌症相关的）

icgc.vcf.gz

dbNSFP2.6vcf

SCLP.ann.vcf.gz

CCLE.ann.vcf.gz

ESP6500-SIv2.vcf.gz （Variants from the Exome Sequencing Project (ESP)）

adni-sum

safs-sum.indel.vcf.gz

gonl.vcf.gz

ssm.vcf.gz

ssi.vcf.gz

uk10k.vcf.gz

1000g-ph3v5.gff.gz （千人基因组计划）

gwasCatalog.gff.gz \

phewascatalog.gff.gz \

dbgap-gwas.gff.gz \

interproDomain.gff.gz \

clinvar.gff.gz \

RegulomeDB.gff.gz \

CancerGAMAdb.gff.gz \

九 29

关注一下华盛顿大学医学院的教授Obi L. Griffith

他的主页：http://www.obigriffith.org/

他的一个比较出名的的贡献是 www.rnaseq.wiki

他在 Biostars bioinformatics forum 非常活跃

他的课程包括Molecular Basis of Cancer (BIO5288) and Genetics and Genomics of Disease (BIO5487) at Washington University School of Medicine.

I was a TA for Genome Analysis (MEDG505) and the bioinformatics section of Advanced Human Molecular Genetics (MEDG520) and a guest instructor for Cell Biology For Biomedical Engineering Graduate Students (APSC552), Cell and Organismal Biology (BIOL111) and Cell Biology (BIOL200) at UBC.

关注一下华盛顿大学医学院的教授Malachi Griffith

他的个人主页是：http://www.malachigriffith.org/index.htm

他的github主页是：https://github.com/malachig

WashU TGI Faculty page: Profile
Linked In: Profile
Twitter: Feed
Google Scholar: Citations
Research Gate: Profile
Scopus: Profile
Open Research ID: Profile
Github: Profile
BioStar: Profile
SeqAnswers: Profile
Code Academy: Profile
Iterative Genomics Consulting: Company website
Flickr: Photostream
www.dgidb.org
www.alexaplatform.org

关注一下麦吉尔大学的Pablo Cingolani教授

他是snpeff的作者

他的github是：https://github.com/pcingola

现就职于McGill University

一个表达芯片数据处理实例

Posted on 2015年9月25日 by ulwvfje

这个实例上部分包括：

如何用R包下载GEO数据(只限单一平台，其余平台需要修改下面的代码)

如何对GEO的芯片数据归一化并且得到表达量矩阵，

如何用limma包做差异分析，

对找到的差异基因如何做GO和KEGG注释

Continue reading →

九 09

affymetix的基因表达芯片数据差异基因分析

Posted on 2015年9月9日 by ulwvfje

我主要是看了一个差异分析的教程，讲的非常详细，全面，我先简单列出这个教程，然后再贴出我的代码

GEO本来只有三种层级的数据，分别是Sample, Platform, and Series

现在共有14,927 platforms，包括主流的affymetrix，agilent，illumina等产商的芯片，以及它们在不同领域的应用（snp，snv，gwas等等），以及各种不同的生物体（人，小鼠，大鼠）

这个分析流程，仅仅针对于affymetrix公司的基因表达相关的芯片数据。

目录如下：

基因芯片（Affymetrix）分析1：芯片质量分析
基因芯片（Affymetrix）分析2：芯片数据预处理
基因芯片（Affymetrix）分析3：获取差异表达基因
基因芯片（Affymetrix）分析4：GO和KEGG分析
基因芯片（Affymetrix）分析5：聚类分析

因为他也是转载，所以链接失效了，现在的链接如下：

http://seuzsl.blog.163.com/blog/static/2187980520134910258605/

http://seuzsl.blog.163.com/blog/static/2187980520134910339429/

http://seuzsl.blog.163.com/blog/static/2187980520134910354705/

http://seuzsl.blog.163.com/blog/static/218798052013491049169/

http://seuzsl.blog.163.com/blog/static/2187980520134910425132/

其实根据目录名重新搜索肯定能得到内容的，链接失效太正常了。

具体内容，我整理并且重新注释了以下，在有道云笔记里面。

http://note.youdao.com/share/?id=e24163d717caa31265a449ba227af491&type=note

基本上只需要用心看这个教程，都能上手芯片数据的差异分析，但这只是差异分析的一种方法而已，而且还是非常过时的方法。

现在比较流行DESeq，edgeR等高通量测序的差异分析包，即使是十几年前的芯片数据，也不需要下载cel那种数据，可以直接下载每个项目的表达量矩阵Series Matrix File(s)

然后在R里面用read.table，调整好参数就可以直接读取啦！

九 06

JQuery学习笔记

Posted on 2015年9月6日 by ulwvfje

以后写这样的文章就直接用有道云笔记分享啦，这样可以节约这个免费的云服务器的空间。

jquery学习笔记第一弹：基础语法

http://note.youdao.com/share/?id=82021515144eb4820762e9fdbc686340&type=note

JQuery笔记第二弹：ppt效果操作

http://note.youdao.com/share/?id=08eb606b2084b9b0d8c9eb5ef72e3433&type=note

JQuery笔记第三弹：操作html元素

http://note.youdao.com/share/?id=fb8ff7deeb186adb82751838bf82cfbe&type=note

JQuery笔记第四弹：循环，遍历，判断等语句实现

http://note.youdao.com/share/?id=746ac6f1a801351f49d13cb3d7a335bf&type=note

JQuery笔记第五弹：Ajax实现

http://note.youdao.com/share/?id=0b2c6fb8c89e307ec79602e6d67e7c66&type=note

JQuery参考手册-函数大全

http://note.youdao.com/share/?id=2e926f98c9bd51b1192d309706f8c1ca&type=note

九 01

生物统计学习资料大全

Posted on 2015年9月1日 by ulwvfje

http://stat.ethz.ch/education/semesters/ss2015

八 29

研究癌症领域必看文献

Posted on 2015年8月29日 by ulwvfje

最近需要了解一些癌症相关知识，看到了这个文献列表，觉得非常棒，所以推荐给大家。

抽时间慢慢看，一个月应该可以把这些文献看完的。

癌症种类大全 http://www.cancer.gov/types
癌症药物大全 http://www.cancer.gov/about-cancer/treatment/drugs
癌症所有的信息几乎都能在这个网站上面找到 http://www.cancer.gov/
包括癌症的科普、treatment、diagnosis，prognosis，classification，drugs、prediction等等

Cancer Precision Medicine: Improving Evidence in Practice - August 24, 2015

NCI-MATCH Trial Opens, AACR blog post, August 2015

NCI-MATCH launch highlights new trial design in precision-medicine era
McNeal C , JNCI, August 2015

The Cancer Genomics Resource List, 2014
Zutter MM et al. CAP Lab Improvement Program,Archives of Pathology, August 2015

Personalized medicine and economic evaluation in oncology: all theory and no practice?
Garattini L et al. Expert Rev Pharmacoecon Outcomes Res 2015 Aug 9. 1-6

Precision medicine trials bring targeted treatments to more patients, C. Helwick, ASCO Post, Jul 25

Next-generation sequencing to guide cancer therapy
Gagan J et al, Genome Medicine, July 29, 2015

Feasibility of large-scale genomic testing to facilitate enrollment onto genomically matched clinical trials.
Meric-Bernstam F et al. J. Clin. Oncol. 2015 May 26.

Brave-ish new world-what's needed to make precision oncology a practical reality.
MacConaill LE et al. JAMA Oncol 2015 Jul 16.

Genomic profiling: Building a continuum from knowledge to care
Helen C et al. JAMA Oncology, July 2015

Are we there yet?
When it comes to curing cancer, targeted therapies and genomic sequencing are helping, but we still have far to go. Genome Magazine, June 29, 2015

Artificial intelligence, big data, and cancer
Kantarjian H et al, JAMA Oncology, June 2015

Multigene panel testing in oncology practice - how should we respond?
Kurian AW et al. JAMA Oncology, June 2015

Use of whole genome sequencing for diagnosis and discovery in the cancer genetics clinic.
Foley SB et al. EBioMedicine 2015 Jan 2(1) 74-81

The future of molecular medicine: biomarkers, BATTLEs, and big data
ES Kim, ASCO University, June 2015

NCI-MATCH trial will link targeted cancer drugs to gene abnormalities

Targeted agent and profiling utilization registry study, from the American Society for Clinical Oncology

ASCO study aims to learn from patient access to targeted cancer drugs used off-label, American Society for Clinical Oncology

Improving evidence developed from population-level experience with targeted agents Adobe PDF file [PDF 462.93 KB]
McLellan M et al Issue Brief. Conference on Clinical Cancer Research November 2014

Implementing personalized cancer care.
Schilsky RL et al. Nat Rev Clin Oncol 2014 Jul (7) 432-8

Accelerating the delivery of patient-centered, high-quality cancer care.
Abrahams E et al. Clin. Cancer Res. 2015 May 15. (10) 2263-7

Next-generation clinical trials: Novel strategies to address the challenge of tumor molecular heterogeneity.
Catenacci DV et al. Mol Oncol 2015 May (5) 967-996

Cancer Precision Medicine: Improving Evidence in Practice - May 29, 2015

Diagnosis and treatment of cancer using genomics
Vockley JG et al. BMJ, May 28, 2015

Targeted agent and profiling utilization registry study, from the American Society for Clinical Oncology

ASCO study aims to learn from patient access to targeted cancer drugs used off-label, American Society for Clinical Oncology

Improving evidence developed from population-level experience with targeted agents Adobe PDF file [PDF 462.93 KB]
McLellan M et al Issue Brief. Conference on Clinical Cancer Research November 2014

Implementing personalized cancer care.
Schilsky RL et al. Nat Rev Clin Oncol 2014 Jul (7) 432-8

Accelerating the delivery of patient-centered, high-quality cancer care.
Abrahams E et al. Clin. Cancer Res. 2015 May 15. (10) 2263-7

Next-generation clinical trials: Novel strategies to address the challenge of tumor molecular heterogeneity.
Catenacci DV et al. Mol Oncol 2015 May (5) 967-996

Precision Medicine: Cancer and Genomics - May 12, 2015

Promise, peril seen in personalized cancer therapy,by Marie McCullough, Philadelphia Inquirer, May 10

A decision support framework for genomically informed investigational cancer therapy.
Meric-Bernstam F et al. J. Natl. Cancer Inst. 2015 Jul (7)

Divide and conquer: The molecular diagnosis of cancer, by Louis M. Staudt, National Cancer Insitute, Apr 13

Health: Make precision medicine work for cancer care
To get targeted treatments to more cancer patients pair genomic data with clinical data, and make the information widely accessible, Mark A. Rubin. Nature News, Apr 15

Using somatic mutations to guide treatment decisions
Horlings H et al. JAMA Oncology, March 12, 2015

The landscape of precision cancer medicine clinical trials in the United States
Roper N et al. Cancer Treatment Reviews 2015

What is “precision medicine? Information from the National Cancer Institute

Impact of cancer genomics on precision medicine for the treatment of cancer, from the Cancer Genome Atlas, NCI

US precision-medicine proposal sparks questions, by Sara Reardon, Nature News, Jan 22

Obama's 'precision medicine' means gene mapping,NBC News, Jan 21

What is President Obama's 'precision medicine' plan, and how might it help you? By Lenny Bernstein, Jan 21

Recent reviews

Companion diagnostics: the key to personalized medicine.
Jørgensen JT. Expert Rev Mol Diagn. 2015 Feb;15(2):153-6

Promoting precision cancer medicine through a community-driven knowledgebase.
Geifman N, et al. J Pers Med. 2014 Dec 15;4(4):475-88.

Toward a prostate cancer precision medicine.
Rubin MA. Urol Oncol. 2014 Nov 20.

Prioritizing targets for precision cancer medicine.
Andre F, et al. Ann Oncol. 2014 Dec;25(12):2295-303

Toward precision medicine with next-generation EGFR inhibitors in non-small-cell lung cancer.
Yap TA, Popat S. Pharmgenomics Pers Med. 2014 Sep 19;7:285-95.

Genomically driven precision medicine to improve outcomes in anaplastic thyroid cancer.
Pinto N, et al. J Oncol. 2014;936285

Translating genomics for precision cancer medicine.
Roychowdhury S, Chinnaiyan AM. Annu Rev Genomics Hum Genet. 2014;15:395-415

The Cancer Genome Atlas: Accomplishments and Future - April 3, 2015

The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge
Tomczak K, et al. Contemp Oncol (Pozn). 2015; 19(1A): A68-A77.

The Cancer Genome Atlas' 4th Annual Scientific Symposium
May 11-12 ~ Bethesda, MD

The Cancer Genome Atlas (TCGA) Data Portal
Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA

Cancer Genomics Hub: A resource of the National Cancer Institute, from the USC Genome Browser

Molecular classification of gastric adenocarcinoma: translating new insights from The Cancer Genome Atlas Research Network.
Sunakawa Y et al. Curr Treat Options Oncol 2015 Apr (4) 331

TCGA data and patient-derived orthotopic xenografts highlight pancreatic cancer-associated angiogenesis.
Gore J et al. Oncotarget 2015 Feb 25.

Radiogenomics of clear cell renal cell carcinoma: preliminary findings of The Cancer Genome Atlas-Renal Cell Carcinoma (TCGA-RCC) Imaging Research Group.
Shinagare AB et al. Abdom Imaging 2015 Mar 10.

Proteomics of colorectal cancer in a genomic context: First large-scale mass spectrometry-based analysis from the Cancer Genome Atlas.
Jimenez CR et al. Clin. Chem. 2015 Feb 26.

End of cancer-genome project prompts rethink
Geneticists debate whether focus should shift from sequencing genomes to analysing function. Heidi Ledford, Nature News and Comments, January 2015

Cancer Genomics: Insights into Driver Mutations - March 10, 2015

Seek and destroy: Relating cancer drivers to therapies
E. Martinez-Ledesma et al. Cell, March 9, 2015

In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities
C Rubio-Perez et al. Cancer Cell, March 9, 2015

MADGiC: a model-based approach for identifying driver genes in cancer. Adobe PDF file [PDF 373.56 KB]
Keegan D. Korthauer et al. Bioinformatics, January 2015

Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine.
Benjamin J Raphael et al. Genome Medicine 2014

Novel recurrently mutated genes in African American colon cancers.
Guda K et al. Proc Natl Acad Sci U S A. 2015 Jan 12

Sparse expression bases in cancer reveal tumor drivers.
Logsdon BA, et al. Nucleic Acids Res. 2015 Jan 12

Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles.
Bertrand D, et al. Nucleic Acids Res. 2015 Jan 8

Identification of constrained cancer driver genes based on mutation timing.
Sakoparnig T, et al. PLoS Comput Biol. 2015 Jan 8;11(1):e1004027

CaMoDi: a new method for cancer module discovery.
Manolakos A, et al. BMC Genomics. 2014 Dec 12;15 Suppl 10:S8.

VHL, the story of a tumour suppressor gene.
Gossage L, et al. Nat Rev Cancer. 2014 Dec 23;15(1):55-64

Targeting the MET pathway for potential treatment of NSCLC.
Li A, et al. Expert Opin Ther Targets. 2014 Dec 23:1-12

Deciphering oncogenic drivers: from single genes to integrated pathways.
Chen J, et al. Brief Bioinform. 2014 Nov 5.

Driver and passenger mutations in cancer.
Pon JR, et al. Annu Rev Pathol. 2014 Oct 17

Hereditary Cancer Genetic Testing: Where are We? - December 18, 2014

NCI paper:Prevalence and correlates of receiving and sharing high-penetrance cancer genetic test results: Findings from the Health Information National Trends Survey
Taber J.M. et al Public Health Genomics, January 2015

Clinical decisions: Screening an asymptomatic person for genetic risk--polling results
Schulte J, et al. N Engl J Med 2014 Nov;371(20):e30

Testing for hereditary breast cancer: Panel or targeted testing? Experience from a clinical cancer genetics practice.
Doherty J, J Genet Couns. 2014 Dec 5

Hereditary colorectal cancer syndromes: American Society of Clinical Oncology clinical practice guideline endorsement of the familial risk-colorectal cancer: European Society for Medical Oncology clinical practice guidelines.
Stoffel EM, et al. J Clin Oncol. 2014 Dec 1

Population testing for cancer predisposing BRCA1/BRCA2 mutations in the Ashkenazi-Jewish community: A randomized controlled trial.
Manchanda R, et al. J Natl Cancer Inst. 2014 Nov 30;107(1)

Cost-effectiveness of population screening for BRCA mutations in Ashkenazi Jewish women compared with family history-based testing.
Manchanda R et al. J Natl Cancer Inst. 2014 Nov 30;107(1). pii: dju380. doi: 10.1093/jnci/dju380. Print 2015 Jan.

Check out our Cancer Genetic Testing Update Page for additional information and links

Cancer Genomic Tests (October 30, 2014)

Cancer Genomic Tests: Accelerating Translation - October 30, 2014

CDC-NCI paper: An overview of recommendations and translational milestones for genomic tests in cancer
Christine Q. Chang et al. Genetics in Medicine, October 22, 2014

Check out the CDC evidence-based classification of cancer genomic tests

Check out the NCI Cancer Genomics and Epidemiology Navigator for latest information on cancer genomic tests

EGAPP: A model process for evaluating genomic applications in practice and prevention. Check out cancer genomic tests, methods, evidence reviews and recommendation statements.

NCI Fact Sheet: Genetic testing for hereditary cancer syndromes

Cancer Genomics: Impact of Recent Insights - October 30, 2014

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.
Katherine A. Hoadley et al. Cell, August 2014

Genome study overhauls cancer categories, shifts from tissues to molecular subtypes, by Kevin Mayer, Genetics Bioengineering News, Aug 8

It's time for us to think about cancer differently, by Paula Mejia, Newsweek, Aug 8

NIH- The Cancer Genome Atlas (TCGA) Initiative

NIH information: What is cancer genomics and the genetic basis of cancer?

Cancer Precision Medicine: Where Are We? - September 18, 2014

NIH announces the launch of 3 integrated precision medicine trials; ALCHEMIST is for patients with certain types of early-stage lung cancer, August 2014

National Cancer Institute's Precision Medicine Initiatives for the New National Clinical Trials Network. Jeffrey Abrams et al. ASCO Annual Meeting 2014

Personalized medicine: Special treatment.
Michael Eisenstein. Nature, September 11, 2014

Why the controversy? Start sequencing tumor genes at diagnosis. Tumor sequencing at the time of diagnosis can give significant insight for successful cancer treatment, by Shelly Gunn, Genetic Engineering & Biotechnology News, Sep 10

National Cancer Institute information: Precision medicine and targeted therapy

Genomics and precision oncology: What's a targeted therapy for cancer? An updated list of approved drugs from the National Cancer Institute (2014)

Therapy: This time it's personal
Gravitz L Nature 509, S52-S54 2014 May 29

Multi-marker solid tumor panels using next-generation sequencing to direct molecularly targeted therapies
Michael Marrone, et al. PLoS Currents Evidence on Genomic Tests 2014 May 27

Impact of cancer genomics on precision medicine for the treatment of cancer, from the National Cancer Institute

Cancer genomics and precision medicine in the 21st century Adobe PDF file [PDF 2.20 MB], power point presentation from the National Human Genome Research Institute

八 28

TCGA年度研讨会资料分享

Posted on 2015年8月28日 by ulwvfje

TCGA想必搞生信都或有耳闻，尤其是癌症研究方向的，共4个年度研讨会，主要是pdf格式的ppt分享，有需要的可以具体点击到页面一个个下载自己慢慢研究，也可以用我下面链接直接下载。

本来是有youtube分享演讲视频的，但是国内被墙了，大家就看看ppt吧

http://www.genome.gov/17516564

The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), which are both part of the National Institutes of Health, U.S. Department of Health and Human Services.

Meetings

The Cancer Genome Atlas Fourth Annual Scientific Symposium
May 11-12, 2015
The Cancer Genome Atlas Third Annual Scientific Symposium
May 12-13, 2014
The Cancer Genome Atlas Second Annual Scientific Symposium
November 27-28, 2012
The Cancer Genome Atlas First Annual Scientific Symposium
November 17-18, 2011

pdf链接地址如下

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Laird.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Durbin.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ley.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Sartor.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ciriello.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Imielinski.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Gao.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Carter.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Ng.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Parvin.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Raphael.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Lawrence.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Kreisberg.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Marra.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Helman.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Stuart.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Cooper.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Levine.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Natsoulis.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Haussler.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Erkkila.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Gehlenborg.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Qiao.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Sivachenko.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Sumazin.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Gutman.pdf

http://www.genome.gov/Multimedia/Slides/TCGA1/TCGA1_Mardis.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/01_Shaw.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/02_Chanock.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/03_Staudt.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/05_Creighton.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/06_Stojanov.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/07_Karchin.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/08_Mungall.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/09_Hakimi.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/10_Gao.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/11_Hayes.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/12_Troester.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/13_Knobluach.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/14_Raphael.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/15_Akbani.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/16_Giordano.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/17_Weinstein.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/18_Zheng.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/19_Getz.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/20_VanDneBroek.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/21_Liao.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/22_Khazanov.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/23_Levine.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/24_Miller.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/25_Ewing.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/26_Cirello.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/27_Verhaak.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/28_Hofree.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/29_Meyerson.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/30_Yang.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/31_Wheeler.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/32_Parfenov.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/33_Bernard-Rovira.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/34_Hast.pdf

http://www.genome.gov/Multimedia/Slides/TCGA2/36_Sellars.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/04_Brat.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/05_Mungall.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/06_Boutros.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/07_Zmuda.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/08_Benz.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/09_Zheng.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/11_Creighton.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/12_Aksoy.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/13_Dinh.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/14_Stuart.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/15_Amin.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/16_Gross.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/15_Akbani.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/18_Giordano.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/19_Amin-Mansour.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/20_Oesper.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/21_Gatza.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/22_Bernard.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/23_Sinha.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/24_Akbani.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/25_Watson.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/26_Martignetti.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/27_Bandlamudi.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/28_Fu.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/29_Akdemir.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/30_Bass.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/31_Hakimi.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/32_Wheeler.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/33_Lehmann.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/34_Gordenin.pdf

http://www.genome.gov/Multimedia/Slides/TCGA3/35_Wyczalkowski.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/02_Zenklusen.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/03_Hutter.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/04_Brat.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/05_Mungall.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/06_Linehan.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/07_Brooks.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/08_Wu.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/09_Giger.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/10_Wilkerson.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/11_Orsulic.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/12_Zhong.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/13_Knijnenburg.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/14_Akbani.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/15_Wang.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/16_Poisson.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/17_Alaeimahabadi.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/18_Noushmehr.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/19_Pantazi.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/20_Shih.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/21_Stransky.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/22_Giordano.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/23_Davidsen.pdf

http://www.genome.gov/Multimedia/Slides/TCGA4/24_Gross.pdf

八 28

生信教程推荐-MSU的一个生信课程

Posted on 2015年8月28日 by ulwvfje

http://angus.readthedocs.org/en/2014/index.html

Next-Gen Sequence Analysis Workshop (2014)

This is the schedule for the 2014 MSU NGS course.

This workshop has a Workshop Code of Conduct.

Download all of these materials or visit the GitHub repository.

Day	Schedule
Monday 8/4	1:30pm lecture: Welcome! (Titus) Tutorial: Day 1 - Getting started with Amazon 7pm: research presentations
Tuesday 8/5	Day 2 – Running BLAST and other things at the command line 9:15am lecture: Sequencing considerations (Titus) 10:30am: tutorial, Running command-line BLAST (Titus) Afternoon: assessment 1:15pm: tutorial, Short Read Quality Control (Elijah and Istvan) (CTB alternate version Evaluating the quality of your short reads, and trimming them) Evening: firepit social
Wed 8/6	9:15am lecture: Mapping and Assembly (Titus) 10:30am: tutorial, Variant calling (Titus) 1:15pm: Understanding the SAM format (Istvan) 7:15pm: tutorial, UNIX command line (Elijah)
Thursday 8/7	9:15am lecture: Genomic Intervals (Istvan) 10:30am mini-diversion: The Bioinformatics Skill System (Istvan) 10:45am: tutorial, Interval Analysis and Visualization (Istvan) 1:15pm: tutorial, Assembling E. coli sequences with Velvet (Titus) 5:30pm: leave for Kalamazoo
Friday 8/8	9:15am-noon lecture/tutorial, R Tutorial for NGS2014 R etc. (Ian Dworkin and Martin Schilling) 1:15pm: tutorial, Variant calling and exploration of polymorphisms 1:15pm: lecture, more variant calling (Martin Schilling) 7pm: lecture, Gene and genome annotation: PowerPoint \| PDF (Daniel Standage)
Saturday 8/9	9:15am-noon: lecture/tutorial, A complete de novo assembly and annotation protocol for mRNASeq (Titus) 1:15pm: lecture/discussion, mRNAseq assembly with Trinity (Meg Staton)
Monday 8/11	9:15am lecture, mRNAseq and counting PDF (Ian Dworkin) 10:30am tutorial, RNA-seq: mapping to a reference genome with tophat and counting with HT-seq (Chris Chandler) 10:45am tutorial, RNASeq Transcript Mapping and Counting (BWA and HtSeq Flavor) (Meg) 2:15pm tutorial, Assembly with SOAPdenovo-Trans (Matt) 7:15pm tutorial, Mapping reads to transcriptomes (Trinity and SOAP) and counting.
Tuesday 8/12	9:15am lecture, mRNAseq and counting lecture 2 PDF (Ian Dworkin) 11:00am tutorial, Analyzing RNA-seq counts with DESeq 1:15pm tutorial, RNA-seq: mapping to a reference genome with BWA and counting with HTSeq (Meg) 2:00pm lecture, A tableside discussion on transcriptome assembly PDF (Matt).
Wed 8/13	9:15am-3pm lecture/tutorial, So you want to get some sequencing data in NCBI? (Adina) 7:15pm: Automation, scripts, git, and GitHub 8:15pm: firepit and a gin social
Thursday 8/14	9:15-10:15 lecture: Phylogeny-based methods for analysing genomes and metagenomes (Aaron Darling) 10:30am-noon tutorial: Genome comparison and phylogeny (Aaron) 1:15pm lecture / tutorial Looking at k-mer abundance distributions (Titus) 3pm long reads lecture / tutorial PacBio Tutorial (Matt MacManes) 6pm: BBQ (no dinner at McCrary) 8pm: invited speaker (NGS/EDAMAME) Jack Gilbert Note: Talk is in academic building auditorium. 9:15pm: firepit
Friday 8/15	9:15-9:45 closing lecture (Titus) 10am discussion about class; more stuff Links: Opinionated guides to NGS / Software Carpentry

七 17

转-windows快捷键，让你的办公效率提升一个档次

Posted on 2015年7月17日 by ulwvfje

gpedit.msc-----组策略

2. sndrec32-------录音机

3. Nslookup-------IP地址侦测器

4. explorer-------打开资源管理器

5. logoff---------注销命令

6. tsshutdn-------60秒倒计时关机命令

7. lusrmgr.msc----本机用户和组

8. services.msc---本地服务设置

9. oobe/msoobe /a----检查XP是否激活

10. notepad--------打开记事本

11. cleanmgr-------垃圾整理

12. net start messenger----开始信使服务

13. compmgmt.msc---计算机管理

15. conf-----------启动netmeeting

16. dvdplay--------DVD播放器

17. charmap--------启动字符映射表

18. diskmgmt.msc---磁盘管理实用程序

　19. calc-----------启动计算器

20. dfrg.msc-------磁盘碎片整理程序

21. chkdsk.exe-----Chkdsk磁盘检查

22. devmgmt.msc--- 设备管理器

23. regsvr32 /u *.dll----停止dll文件运行

24. drwtsn32------ 系统医生

25. rononce -p ----15秒关机

26. dxdiag---------检查DirectX信息

27. regedt32-------注册表编辑器

28. Msconfig.exe---系统配置实用程序

29. rsop.msc-------组策略结果集

30. mem.exe--------显示内存使用情况

31. regedit.exe----注册表

32. winchat--------XP自带局域网聊天

33. progman--------程序管理器

34. winmsd---------系统信息

　　43. write----------写字板

44. winmsd---------系统信息

46. winchat--------XP自带局域网聊天

48. Msconfig.exe---系统配置实用程序

49. mplayer2-------简易widnows media player

50. mspaint--------画图板

51. mstsc----------远程桌面连接

52. mplayer2-------媒体播放机

53. magnify--------放大镜实用程序

　54. mmc------------打开控制台

55. mobsync--------同步命令

56. dxdiag---------检查DirectX信息

57. drwtsn32------ 系统医生

58. devmgmt.msc--- 设备管理器

59. dfrg.msc-------磁盘碎片整理程序

60. diskmgmt.msc---磁盘管理实用程序

61. dcomcnfg-------打开系统组件服务

62. ddeshare-------打开DDE共享设置

65. net start messenger----开始信使服务

67. nslookup-------网络管理的工具向导

68. ntbackup-------系统备份和还原

69. narrator-------屏幕“讲述人”

70. ntmsmgr.msc----移动存储管理器

71. ntmsoprq.msc---移动存储管理员操作请求

72. netstat -an----(TC)命令检查接口

73. syncapp--------创建一个公文包

　　74. sysedit--------系统配置编辑器

75. sigverif-------文件签名验证程序

76. sndrec32-------录音机

77. shrpubw--------创建共享文件夹

78. secpol.msc-----本地安全策略

　80. services.msc---本地服务设置

81. Sndvol32-------音量控制程序

82. sfc.exe--------系统文件检查器

83. sfc /scannow---windows文件保护

84. tsshutdn-------60秒倒计时关机命令

85. tourstart------xp简介(安装完成后出现的漫游xp程序)

86. taskmgr--------任务管理器

　87. eventvwr-------事件查看器

88. eudcedit-------造字程序

　92. progman--------程序管理器

94. rsop.msc-------组策略结果集

95. regedt32-------注册表编辑器

96. rononce -p ----15秒关机

99. cmd.exe--------CMD命令提示符

100. chkdsk.exe-----Chkdsk磁盘检查

101. certmgr.msc----证书管理实用程序

　102. calc-----------启动计算器

103. charmap--------启动字符映射表

104. cliconfg-------SQL SERVER 客户端网络实用程序

105. Clipbrd--------剪贴板查看器

106. conf-----------启动netmeeting

107. compmgmt.msc---计算机管理

108. cleanmgr-------垃圾整理

109. ciadv.msc------索引服务程序

110. osk------------打开屏幕键盘

　113. lusrmgr.msc----本机用户和组

114. logoff---------注销命令

115. fsmgmt.msc-----共享文件夹管理器

116. utilman--------辅助工具管理器

117. iexpress-------木马捆绑工具

　打开服务管理器的是services.msc

如果要用cmd直接启用已知服务名的服务如下：

net start [服务名] 启动一个服务

net stop [服务名] 停用一个服务

六 01

Samtools无法同时得到mpileup格式的数据和bcftools格式的数据

Posted on 2015年6月1日 by ulwvfje

来自于： https://www.biostars.org/p/63429/

I'm using samtools mpileup and would like to generate both a pileup file and a vcf file as output. I can see how to generate one or the other, but not both (unless I run mpileup twice). I suspect I am missing something simple.

Specifically, calling mpileup with the -g or -u flag causes it to compute genotype likelihoods and output a bcf. Leaving these flags off just gives a pileup. Is there any way to get both, without redoing the work of producing the pileup file? Can I get samtools to generate the bcf _from_ the pileup file in some way? Generating the bcf from the bam file, when I already have the pileup, seems wasteful.

Thanks for any help!

我写了脚本来运行，才发现我居然需要两个重复的步骤来得到mpileup格式的数据和bcftools格式的数据，而这很明显的重复并且浪费时间的工作

for i in *sam

echo $i

samtools view -bS $i >${i%.*}.bam

samtools sort ${i%.*}.bam ${i%.*}.sorted

samtools index ${i%.*}.sorted.bam

samtools mpileup -f /home/jmzeng/ref-database/hg19.fa ${i%.*}.sorted.bam >${i%.*}.mpileup

samtools mpileup -guSDf /home/jmzeng/ref-database/hg19.fa ${i%.*}.sorted.bam | bcftools view -cvNg - > ${i%.*}.vcf

Done

我想得到mpileup格式，是因为后续的varscan等软件需要这个文件来call snp

而得到bcftools格式可以直接用bcftools进行snp-calling

samtools mpileup 命令只有用了-g或者-u那么就只会输出bcf文件

如果想得到mpileup格式的数据，就只能用-f参数。

bcftools doesn't work on pileup format data. It works on bcf/vcf files.
samtools provides a script called sam2vcf.pl, which works on the output of "samtools pileup". However, this command is deserted in newer versions. The output of "samtools mpileup" does not satisfy the requirement of sam2vcf.pl. You can check the required pileup format on lines 95-99, which is different from output of "samtools mpileup".

五 05

国外最出名的R语言大会-useR

Posted on 2015年5月5日 by ulwvfje

这是2014年的会议报告以及ppt，但是好像很多ppt都是需要翻墙才能下载

http://user2014.stat.ucla.edu/#tutorials

Morning Tutorials Monday, 9:15

Room	Presenter	Title
Palisades Salon A+B	Max Kuhn	Applied Predictive Modeling in R
Palisades Salon C+F	Winston Chang	Interactive graphics with ggvis
Palisades Salon D+E	Yihui Xie	Dynamic Documents with R and knitr [Slides] [Examples]
Hermosa	Romain Francois	C++ and Rcpp11 for beginners [slides]
Venice	Bob Muenchen	Managing Data with R
Sproul-Landing building, 3rd floor	Matt Dowle	Introduction to data.table [Tutorial] [Talk]
Sproul-Landing building, 4th floor	Virgilio Gomez Rubio	Applied Spatial Data Analysis with R
Sproul-Landing building, 5th floor	Martin Morgan	Bioconductor

Afternoon Tutorials Monday, 14:00

Room	Presenter	Title
Palisades Salon A+B	Hadley Wickham	Data manipulation with dplyr
Palisades Salon C+F	Garrett Grolemund	Interactive data display with Shiny and R
Palisades Salon D+E	Drew Schmidt	Programming with Big Data in R
Hermosa	S繪ren H繪jsgaard	Graphical Models and Bayesian Networks with R
Venice	John Nash	Nonlinear parameter optimization and modeling in R [slides]
Sproul-Landing building, 3rd floor	Dirk Eddelbuettel	An Example-Driven Hands-on Introduction to Rcpp [slides]
Sproul-Landing building, 4th floor	Ramnath Vaidyanathan	Interactive Documents with R
Sproul-Landing building, 5th floor	Thomas Petzoldt	Simulating differential equation models in R

然后2015年的也要开始了，有兴趣的朋友可以June 30 - July 3, 2015
Aalborg, Denmark看看，有很多干货分享！

http://user2015.math.aau.dk/#BN

2015的内容如下

五 01

CHIP-seq第三讲之使用MACS软件寻找peaks

Posted on 2015年5月1日 by ulwvfje

在使用Bowtie比对于完Chip-Seq的结果后，就需要用到MACS或者ERANGE来找出峰所在的位置了。但是由于ERANGE的设置比较复杂，所以最为流行的还是MACS。

一．首先安装MACS软件

MACS有两个版本，分别是MACS14和MACS2。MACS2在很多方面都对MACS14做了重大改进，但目前还在测试阶段。我们依然以MACS14为例进行说明。

MACS软件的下载地址在wget https://codeload.github.com/taoliu/MACS/zip/master

这是一个python软件，有152M，已经算是很大了！所以需要按照安装python的方法来安装它！但是，好像这个是最新版的，我们还是用1.4版本吧

wget http://github.com/downloads/taoliu/MACS/MACS-1.4.2-1.tar.gz

其实它的readme已经把这个软件的各种安装使用方法讲的很清楚了。

https://github.com/taoliu/MACS/blob/master/README.rst

MACS软件的具体原理，大家去看文献，或者参考这篇文章

http://www.plob.org/2014/05/08/7227.html

很简单的一个python命令即可安装该软件python setup.py install --user

二．然后准备该软件所需要的数据

是我们在前两篇文章中提到的数据

三．接着运行MACS的命令

/home/jmzeng/.local/bin/macs14 -t Xu_WT_rep2_BAF155.fastq.trimmed.single.bam \

> -c Xu_WT_rep2_Input.fastq.trimmed.single.bam \

> -f BAM -g hs --bw 300 -w -S -n Xu_WT_rep2

四．最后解读一下结果

56K Apr 30 21:54 Xu_WT_rep2_model.r

5.5K Apr 30 22:21 Xu_WT_rep2_negative_peaks.xls

783K Apr 30 22:21 Xu_WT_rep2_peaks.bed

865K Apr 30 22:21 Xu_WT_rep2_peaks.xls

766K Apr 30 22:21 Xu_WT_rep2_summits.bed

唉，反正这也不是我的课题，懒得解释这些结果啦，等后来有机会再慢慢玩吧

参考 http://www.plob.org/2014/05/08/7227.html

附录：我们现在来了解如何设置参数。

参考自 http://www.plob.org/2014/01/26/7118.html

-t TFILE, –treatment=TFILE 输入文件名

-c CFILE, –control=CFILE 输入阴对文件名

-n NAME, –name=NAME 输入出文件名前缀

-f FORMAT, –format=FORMAT 输入文件格式，默认值为AUTO，可选的值为”BEG”,”ELAND”,”ELANDMULTI”,”ELANDMULTIPER”,”ELANDEXPORT”,”SAM”,”BAM”,”BOWTIE”等。

-g GSIZE, –gsize=GSIZE 比对模板大小。格式可以是：1.0e+9，或者1000000000，也可以缩写：’hs’ for 人类 (2.7e9), ‘mm’ for 大鼠(1.87e9), ‘ce’ for 线虫 (9e7) and ‘dm’ for 果蝇 (1.2e8), 默认值:hs

-s TSIZE, –tsize=TSIZE 设置为短序列的长度，默认值为25

-p PVALE, –pvalue=PVALUE 非峰可能性截取值，默认值为1e-5，这个值不能大太，超过0.9的话，可能无法输出正确的结果

-m MFOLD, –mfold=MFOLD 峰值高度相对于本底的比值，默认值为10,30。也就是说，最低值不能少于10，但比值超过30也不认为它是正常的一个峰。一般而言，低值设置为10是一个很好的区分点。如果这个值还是无法得到满意的结果，那么可以设置得更低，但最好还是使用–nomodel参数，使–nomodel设置为True，然后再传递–shiftsize及–bw参数给MACS。–shiftsize默认值为100，而–bw的默认值为300。

–diag 生成完整报表，会包括是否为真峰的可能性，但会严重拖累运算速度。

四 30

自学CHIP-seq第二讲之过滤数据并比对

Posted on 2015年4月30日 by ulwvfje

这个是有着非常成熟的流程了，我就不细讲了！

我们随机挑选两个文件来跑一下CHIP-seq的流程吧，其中一个是.部分进行免疫共沉淀前的DNA（input DNA）作为空白对照。

5.5G Apr 30 10:31 Xu_WT_rep2_BAF155.fastq

18G Feb 13 20:37 Xu_WT_rep2_Input.fastq

首先进行质量控制，过滤低质量的reads

这里我选取的是DynamicTrim.pl 和

脚本如下

for id in *fastq

echo $id

perl DynamicTrim.pl $id

done

接下来

for id in *.trimmed

echo $id

perl LengthSort.pl $id

Done

这样就得到了过滤后的reads，可以进行比对啦！

当然，中间文件可以删掉啦，不然太占空间了，我还只是取了两个数据，要是把这个文章的八个数据都跑完就太纠结了。

然后用bowtie比对

#samtools faidx hg19.fa

#Bowtie2-build hg19.fa hg19

for i in *single

bowtie2 -x /home/jmzeng/ref-database/hg19 -U $i -S $i.sam

samtools view -bS $i.sam> $i.bam

done

输出的bam文件就需要用MASC这个软件来找peak了

四 30

自学CHIP-seq第一讲之文献解读

Posted on 2015年4月30日 by ulwvfje

我这里选择的CHIP-seq文章题目是

CARM1 Methylates Chromatin Remodeling Factor BAF155 to Enhance Tumor Progression and Metastasis

文章链接http://www.sciencedirect.com/science/article/pii/S1535610813005369

这是2013年的文章，算是蛮新的了，主要探究了CARM1这个基因

然后我简单搜索了一些这个基因的信息

9606 10498 CARM1

- PRMT4

MIM:603934|HGNC:HGNC:23393|

Ensembl:ENSG00000142453|HPRD:09158|Vega:OTTHUMG00000180699

19 19p13.2 coactivator-associated arginine methyltransferase 1

protein-coding CARM1 coactivator-associated arginine methyltransferase histone-arginine methyltransferase CARM1|protein arginine N-methyltransferase 4 20150308

该基因是多种肿瘤相关的转录因子的共激活剂（激活蛋白;转录辅助激活蛋白;转录共同活化子）。

文章作者做了以下四件事

Knockout of CARM1 Using ZFN in Breast Cancer Cells

Identification of BAF155 as a Novel CARM1 Substrate

Methylation of BAF155 Promotes Tumor Growth and Metastasis

Methylated BAF155 Gains Unique Chromatin Association

所以就有两种细胞，一种是野生型WT，一种是突变的MUT细胞

然后它们分别做了两个重复，一种是input一种是BAF155免疫测序。

CHIP-seq一定是有一个input对照文件，和一个真正的免疫共沉淀的测序文件。

这样就有八个测序文件。