本帖紧跟前面的仔细探究samtools的rmdup是如何行使去除PCR重复reads功能的
同样的我们也是分单端和双端测序来看结果,并且比较两个工具的区别!
首先对于那个单端数据,samtools给出的结果是:[bam_rmdupse_core] 25 / 53 = 0.4717 in library
而我用picard得到的结果是:
INFO 2016-11-12 09:48:29 MarkDuplicates Read 53 records. 0 pairs never matched.
INFO 2016-11-12 09:48:31 MarkDuplicates After buildSortedReadEndLists freeMemory: 248541856; totalMemory: 3887595520; maxMemory: 57266405376
INFO 2016-11-12 09:48:31 MarkDuplicates Will retain up to 1789575168 duplicate indices before spilling to disk.
INFO 2016-11-12 09:49:14 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2016-11-12 09:49:15 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2016-11-12 09:49:15 MarkDuplicates Sorting list of duplicate records.
INFO 2016-11-12 09:54:35 MarkDuplicates After generateDuplicateIndexes freeMemory: 3885082288; totalMemory: 18204327936; maxMemory: 57266405376
INFO 2016-11-12 09:54:35 MarkDuplicates Marking 25 records as duplicates.
INFO 2016-11-12 09:54:35 MarkDuplicates Found 0 optical duplicate clusters.
看起来并没有差别哦,找到的duplicate都是一样的,但是这种java软件的缺点就是奇慢无比~~~~
而且picard对于单端或者双端测序数据并没有区分参数,可以用同一个命令!
那么接下来我测试双端测序数据, 依然是没有差别,都是去掉了4个,可能是我给出的测试数据太少了。
INFO 2016-11-12 09:57:45 MarkDuplicates Read 30 records. 3 pairs never matched.
INFO 2016-11-12 09:57:47 MarkDuplicates After buildSortedReadEndLists freeMemory: 248541896; totalMemory: 3887595520; maxMemory: 57266405376
INFO 2016-11-12 09:57:47 MarkDuplicates Will retain up to 1789575168 duplicate indices before spilling to disk.
INFO 2016-11-12 09:58:26 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2016-11-12 09:58:26 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2016-11-12 09:58:26 MarkDuplicates Sorting list of duplicate records.
INFO 2016-11-12 10:02:59 MarkDuplicates After generateDuplicateIndexes freeMemory: 3885083112; totalMemory: 18204327936; maxMemory: 57266405376
INFO 2016-11-12 10:02:59 MarkDuplicates Marking 4 records as duplicates.
测试数据,大家可以去下载,里面有脚本和测试数据!http://www.biotrainee.com/jmzeng/rmDuplicate.zip