此文专门讲这个软件如何用,但是跟我以前写的软件说明书又不大一样,主要是因为我用MACS2这个软件call peaks并没有达到预期的结果,所以就多使用了几个软件,其中PeakRanger尤其值得一提,安装特别简单,而且处理数据的速度特别快,结果也非常容易理解,更重要的是它给出一个网页版的报告,里面有所有找到的符合要求的peaks的可视化图片!!!!
该软件有linux二进制版本,所以直接下载解压即可使用,具体代码如下:
## Download and install PeakRanger
cd ~/biosoft
mkdir PeakRanger && cd PeakRanger
wget https://sourceforge.net/projects/ranger/files/PeakRanger-1.18-Linux-x86_64.zip/
## Length: 1517587 (1.4M) [application/octet-stream]
unzip PeakRanger-1.18-Linux-x86_64.zip
~/biosoft/PeakRanger/bin/peakranger -h
下面的笔记是我做自学CHIP-seq数据分析系列教程的,所以中英文夹杂,大家将就着看吧,里面很多链接,大家可以进去自己学习
### step6.8 peak calling by PeakRanger
# PeakRanger is a multi-purporse software suite for analyzing next-generation sequencing (NGS) data. The suite contains the following tools:
# Used by modENCODE, iPlant and many others
# Not just for calling narrow and broad peaks
# Runs fast, together with sleek program options
To measure the significance of the enriched regions, PeakRanger uses binormial distribution to model the relative enrichment of sample over control.
A p value is generated as a result. Users can thus select highly significant peaks by using a smaller -p.
In addition, users can filter peaks by the '-q' option, which controls the FDR of peaks.
For each p-value, the Benjamini-Hochberg procedure is applied to calculate the FDR.
# http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz ## gunzip refGene.txt.gz ; mv refGene.txt hg19refGene.txt
#### software : http://ranger.sourceforge.net/ go to the root path of the unzipped package and type:make
#### readme: http://ranger.sourceforge.net/manual1.18.html
# http://www.broadinstitute.org/~anshul/projects/encode/preprocessing/peakcalling/peakranger/bin/MANUAL
### ~/biosoft/PeakRanger/bin/peakranger -h ##我的软件已经安装完毕
nr estimate data quality
lc calculate library complexity
wig generate wiggle files
wigpe generate wiggle files for paired reads
ranger peak calling for sharp peaks
ccat peak calling for broad peaks
bcp peak calling for complex broad peaks
## 上面是该软件的几个用法,它直接各种格式的比对文件,我这里给的bed格式的,就是把sam转为bam再转为bed,,大家没必要那么复杂, 直接用bam格式即可
~/biosoft/PeakRanger/bin/peakranger nr --format bed SRR1042593.clean_bed SRR1042594.clean_bed
~/biosoft/PeakRanger/bin/peakranger ccat --format bed SRR1042593.clean_bed SRR1042594.clean_bed \
Xu_MUT_rep1_ccat_report --report --gene_annot_file hg19refGene.txt -q 0.05 -t 4
很快就出结果,找到的peak非常多,但是需要过滤
844K Jun 30 09:32 Xu_MUT_rep1_ccat_report_details
637K Jun 30 09:32 Xu_MUT_rep1_ccat_report_region.bed
798K Jun 30 09:32 Xu_MUT_rep1_ccat_report_summit.bed
需要重点看到就是details文件,格式如下:很容易理解
#region_chr region_start region_end nearby_genes(6kbp) region_ID region_summits region_fdr region_strand region_treads region_creads
chr1 121482750 121486000 ccat_fdrPassed_0_fdr_0.001 121485025 0.001 + 551 642
chr1 115296600 115302500 CSDE1 ccat_fdrFailed_0_fdr_0.646 115301075 0.646 + 58 217
chr1 114351100 114356850 PTPN22,RSBN1 ccat_fdrFailed_3_fdr_0.646 114355425 0.646 + 48 112
很容易使用,但是具体条件参数,就需要自己看说明书啦
Guide: Peak Calling for ChIP-Seq : http://epigenie.com/guide-peak-calling-for-chip-seq/