九 24

用 GMAP/GSNAP软件进行RNA-seq的alignment

Posted on 2015年9月24日 by ulwvfje

软件发表在：http://bioinformatics.oxfordjournals.org/content/26/7/873.abstract

软件的解说ppt ：http://www.mi.fu-berlin.de/wiki/pub/ABI/CompMethodsWS11/MHuska_GSNAP.pdf

一个例子：http://qteller.com/RNAseq-analysis-recipe.pdf

一个shell脚本： https://github.com/vsbuffalo/rna-seq-example

软件的下载地址： http://research-pub.gene.com/gmap/

有研究者认为这个软件的比对效果要比tophat要好，虽然现在已经多出来了非常多的RNA-seq的alignment软件，我还是简单看看这个软件吧，它本来是2005就出来的一个专门比对低通量的est序列，叫GMAP，后来进化成了GSNAP

step1：下载安装GMAP/GSNAP

wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2015-09-21.tar.gz

是一个标准的linux源码程序，安装之前一定要看readme ，http://research-pub.gene.com/gmap/src/README

解压进去，然后源码安装三部曲,首先 ./configu 然后make 最后make install

会默认安装在 /usr/local/bin 下面，这里需要修改，因为你可能没有 /usr/local/bin 权限,安装到自己的目录，然后把它添加到环境变量！

step2 ：准备数据

比对一般都只需要两个数据，一是索引好的参考基因组，另一个是需要比对的测序数据。

但是这个GSNAP，还需要对应的GTF注释文件。

首先需要参考基因组：虽然软件本身提供了一个hg19的参考基因组，并且已经索引好了Human genome, version hg19 (5.5 GB)(http://research-pub.gene.com/gmap/genomes/hg19.tar.gz) ，但是下载很慢，而且不是对所有版本的GSNAP都适用。所以我这里对我自己的参考基因组进行索引。

gmap_build -D ./ -d my_hg19.fa

然后取ensemble下载hg19的gtf文件。

然后还需要把自己下载的gtf文件也构建索引，需要两个步骤

cat my_hg19.gtf | ~/software/gmap-2011-10-16/util/gtf_splicesites > my_hg19.splicesites

cat my_hg19.splicesites | iit_store -o my_hg19.gtf.index

然后拷贝需要比对的RNA-seq测序文件

step3: 运行程序

就是一步比对而已

gsnap

-D /home/jschnable/gsnap_indexes/

-d arabidopsisv10

--nthreads=50

-B 5

-s /home/jschnable/gsnap_indexes/arabidopsisv10.iit

-n 2

-Q

--nofails

--format=sam temp.fastq

> results.sam

参数有点多，自己看看说明书吧http://qteller.com/RNAseq-analysis-recipe.pdf 讲的非常详细。

九 24

用freebayes来call snps

Posted on 2015年9月24日 by ulwvfje

软件地址：http://clavius.bc.edu/~erik/freebayes/

软件教程：http://clavius.bc.edu/~erik/CSHL-advanced-sequencing/freebayes-tutorial.html

step1：，软件安装

wget http://clavius.bc.edu/~erik/freebayes/freebayes-5d5b8ac0.tar.gz
tar xzvf freebayes-5d5b8ac0.tar.gz
cd freebayes
make
一个小插曲，安装的过程报错：/bin/sh: 1: cmake: not found
所以我需要自己下载安装cmake，然后把cmake添加到环境变量

首先下载源码包http://www.cmake.org/cmake/resources/software.html

wget http://cmake.org/files/v3.3/cmake-3.3.2.tar.gz

 解压进去，然后源码安装三部曲,首先 ./configu  然后make 最后make install

cmake 会默认安装在 /usr/local/bin 下面，这里需要修改，因为你可能没有 /usr/local/bin 权限,安装到自己的目录，然后把它添加到环境变量！

step2：准备数据

an alignment file (in BAM format)

a reference genome in (uncompressed) FASTA format.

正好我的服务器里面有很多

不过，该软件也可以出了一个测试数据集

wget http://bioinformatics.bc.edu/marthlab/download/gkno-cshl-2013/chr20.fa

wget ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/data/NA12878/alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
wget ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/data/NA12878/alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam.bai

用这个代码就可以下载千人基因组计划的NA12878样本的第20号染色体相关数据啦

step3：运行命令

网站给出的实例是：

freebayes -f chr20.fa \
    NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam >NA12878.chr20.freebayes.vcf

一般就用默认参数即可

step4：输出结果解读

没什么好解读的了，反正是vcf文件，都看烂了，就那些东西

不过该软件的作者倒是拿该软件与broad用GATK做出的NA12878样本的突变数据做了比较

九 24

broad_institute收集的癌症数据

Posted on 2015年9月24日 by ulwvfje

肾上腺皮质	Adrenocortical carcinoma	ACC	92	Browse	Browse
膀胱，尿路上皮	Bladder urothelial carcinoma	BLCA	412	Browse	Browse
乳腺癌	Breast invasive carcinoma	BRCA	1098	Browse	Browse
子宫颈	Cervical and endocervical cancers	CESC	307	Browse	Browse
胆管癌	Cholangiocarcinoma	CHOL	36	Browse	Browse
结肠腺癌	Colon adenocarcinoma	COAD	460	Browse	Browse
大肠腺癌	Colorectal adenocarcinoma	COADREAD	631	Browse	Browse
淋巴肿瘤弥漫性大B细胞淋巴瘤	Lymphoid Neoplasm Diffuse Large B-cell Lymphoma	DLBC	58	Browse	Browse
食管	Esophageal carcinoma	ESCA	185	Browse	Browse
FFPE试点二期	FFPE Pilot Phase II	FPPP	38	None	Browse
胶质母细胞瘤	Glioblastoma multiforme	GBM	613	Browse	Browse
脑胶质瘤	Glioma	GBMLGG	1129	Browse	Browse
头颈部鳞状细胞癌	Head and Neck squamous cell carcinoma	HNSC	528	Browse	Browse
肾嫌色	Kidney Chromophobe	KICH	113	Browse	Browse
泛肾	Pan-kidney cohort (KICH+KIRC+KIRP)	KIPAN	973	Browse	Browse
肾透明细胞癌	Kidney renal clear cell carcinoma	KIRC	537	Browse	Browse
肾乳头细胞癌	Kidney renal papillary cell carcinoma	KIRP	323	Browse	Browse
急性髓系白血病	Acute Myeloid Leukemia	LAML	200	Browse	Browse
脑低级神经胶质瘤	Brain Lower Grade Glioma	LGG	516	Browse	Browse
肝癌	Liver hepatocellular carcinoma	LIHC	377	Browse	Browse
肺腺癌	Lung adenocarcinoma	LUAD	585	Browse	Browse
肺鳞状细胞癌	Lung squamous cell carcinoma	LUSC	504	Browse	Browse
间皮瘤	Mesothelioma	MESO	87	Browse	Browse
卵巢浆液性囊腺癌	Ovarian serous cystadenocarcinoma	OV	602	Browse	Browse
胰腺癌	Pancreatic adenocarcinoma	PAAD	185	Browse	Browse
嗜铬细胞瘤和副神经节瘤	Pheochromocytoma and Paraganglioma	PCPG	179	Browse	Browse
前列腺癌	Prostate adenocarcinoma	PRAD	499	Browse	Browse
直肠腺癌	Rectum adenocarcinoma	READ	171	Browse	Browse
肉瘤	Sarcoma	SARC	260	Browse	Browse
皮肤皮肤黑色素瘤	Skin Cutaneous Melanoma	SKCM	470	Browse	Browse
胃腺癌	Stomach adenocarcinoma	STAD	443	Browse	Browse
胃和食管癌	Stomach and Esophageal carcinoma	STES	628	Browse	Browse
睾丸生殖细胞肿瘤	Testicular Germ Cell Tumors	TGCT	150	Browse	Browse
甲状腺癌	Thyroid carcinoma	THCA	503	Browse	Browse
胸腺瘤	Thymoma	THYM	124	Browse	Browse
子宫内膜癌	Uterine Corpus Endometrial Carcinoma	UCEC	560	Browse	Browse
子宫癌肉瘤	Uterine Carcinosarcoma	UCS	57	Browse	Browse
葡萄膜黑色素瘤	Uveal Melanoma	UVM	80	Browse	Browse

看起来癌症很多呀，任重道远

九 24

perl的模块组织方式

Posted on 2015年9月24日 by ulwvfje

如何使用自己写的私人模块

模块通俗来讲，就是一堆函数的集合。

Personally I prefer to keep my modules (those that I write for myself or for systems I can control) in a certain directory, and also to place them in a subdirectory. As in:

/www/modules/MyMods/Foo.pm
/www/modules/MyMods/Bar.pm

And then where I use them:

use lib qw(/www/modules);useMyMods::Foo;

useMyMods::Bar;

As reported by "perldoc -f use":

It is exactly equivalent to
BEGIN { require Module; import Module LIST; }
except that Module must be a bareword.

Putting that another way, "use" is equivalent to:

running at compile time,
converting the package name to a file name,
require-ing that file name, and
import-ing that package.

So, instead of calling use, you can call require and import inside a BEGIN block:

BEGIN{require'../EPMS.pm';
  EPMS->import();}

And of course, if your module don't actually do any symbol exporting or other initialization when you call import, you can leave that line out:

BEGIN{require'../EPMS.pm';}

比如我的一个模块如下，命名为my_stat.pm：

package my_stat;

sub mean{

my $sum=0;

$sum+=$_ foreach @_;

$sum/($#_+1);

}

#print &mean(1..10),"\n";

sub stddev{

$avg=&mean(@_);

#print "$avg\n";

my $sum=0;

$sum+=($_-$avg)**2 foreach @_;

sqrt($sum/($#_));

#It will be different if you use $#_+1;

#sqrt($sum/($#_+1));

}

#print &stddev(1..10),"\n";

里面有我定义好的两个函数 mean 和 stddev , 那么我就可以在我的其它perl程序里面直接引用这个模块，从而使用我的两个自定义函数。

use lib "./"; #取决于你把自定义模块my_stat.pm放在哪个目录

use my_stat;

print my_stat::stddev(1..10),"\n";

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

Daily Archives: 2015年9月24日

用 GMAP/GSNAP软件进行RNA-seq的alignment

用freebayes来call snps

broad_institute收集的癌症数据

perl的模块组织方式

2015年9月
一	二	三	四	五	六	日
« 八				十 »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30