有两个很重要的参数:
- -m,指定最大比对到基因组的次数(suppress all alignments if > exist (def: no limit))
- -v或者-n,允许最大错配数,为[0-3]
我们这里使用RNA expression profiling of human iPSC-derived cardiomyocytes in a cardiac hypertrophy model. PLoS One 2014;9(9):e108051. PMID: 25255322 文章里面的 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE60292 数据集来测试
如果全部默认参数
默认是 -n 2 , 是比较宽松的比对条件。
bowtie $mature $id -S ${id}_matrue.sam
bowtie $hairpin $id -S ${id}_hairpin.sam
结果日志如下:
SRR1542714_clean.fq.gz
# reads processed: 1520320
# reads with at least one reported alignment: 704761 (46.36%)
# reads that failed to align: 815559 (53.64%)
Reported 704761 alignments
# reads processed: 1520320
# reads with at least one reported alignment: 950496 (62.52%)
# reads that failed to align: 569824 (37.48%)
Reported 950496 alignments
SRR1542715_clean.fq.gz
# reads processed: 1461555
# reads with at least one reported alignment: 774383 (52.98%)
# reads that failed to align: 687172 (47.02%)
Reported 774383 alignments
# reads processed: 1461555
# reads with at least one reported alignment: 1126280 (77.06%)
# reads that failed to align: 335275 (22.94%)
Reported 1126280 alignments
如果加上-v 1 -m 3 之后
bowtie -v 1 -m 3 $mature $id -S ${id}_matrue.sam
bowtie -v 1 -m 3 $hairpin $id -S ${id}_hairpin.sam
比对结果日志如下:
SRR1542714_clean.fq.gz
# reads processed: 1520320
# reads with at least one reported alignment: 477143 (31.38%)
# reads that failed to align: 985841 (64.84%)
# reads with alignments suppressed due to -m: 57336 (3.77%)
Reported 477143 alignments
# reads processed: 1520320
# reads with at least one reported alignment: 612741 (40.30%)
# reads that failed to align: 737467 (48.51%)
# reads with alignments suppressed due to -m: 170112 (11.19%)
Reported 612741 alignments
SRR1542715_clean.fq.gz
# reads processed: 1461555
# reads with at least one reported alignment: 623142 (42.64%)
# reads that failed to align: 779432 (53.33%)
# reads with alignments suppressed due to -m: 58981 (4.04%)
Reported 623142 alignments
# reads processed: 1461555
# reads with at least one reported alignment: 883202 (60.43%)
# reads that failed to align: 437340 (29.92%)
# reads with alignments suppressed due to -m: 141013 (9.65%)
Reported 883202 alignments
如果加上 -n1 -m 3 之后
SRR1542714_clean.fq.gz
# reads processed: 1520320
# reads with at least one reported alignment: 477143 (31.38%)
# reads that failed to align: 985841 (64.84%)
# reads with alignments suppressed due to -m: 57336 (3.77%)
Reported 477143 alignments
# reads processed: 1520320
# reads with at least one reported alignment: 612741 (40.30%)
# reads that failed to align: 737467 (48.51%)
# reads with alignments suppressed due to -m: 170112 (11.19%)
Reported 612741 alignments
SRR1542715_clean.fq.gz
# reads processed: 1461555
# reads with at least one reported alignment: 623142 (42.64%)
# reads that failed to align: 779432 (53.33%)
# reads with alignments suppressed due to -m: 58981 (4.04%)
Reported 623142 alignments
# reads processed: 1461555
# reads with at least one reported alignment: 883202 (60.43%)
# reads that failed to align: 437340 (29.92%)
# reads with alignments suppressed due to -m: 141013 (9.65%)
Reported 883202 alignments
如果加上 -n2 -m3
SRR1542714_clean.fq.gz
# reads processed: 1520320
# reads with at least one reported alignment: 469486 (30.88%)
# reads that failed to align: 815559 (53.64%)
# reads with alignments suppressed due to -m: 235275 (15.48%)
Reported 469486 alignments
# reads processed: 1520320
# reads with at least one reported alignment: 579377 (38.11%)
# reads that failed to align: 569824 (37.48%)
# reads with alignments suppressed due to -m: 371119 (24.41%)
Reported 579377 alignments
检查其中一个
>hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p
TGAGGTAGTAGGTTGTATAGTT
# 默认参数
SRR1542718.507 0 hsa-let-7a-5p 1 255 21M * 0 0 TGAGGTAGTAGGTTGTATAGT ???B;AA::9B:>7>C96;99 XA:i:0 MD:Z:21 NM:i:0 XM:i:2
# -n2 -m3 参数
SRR1542718.507 4 * 0 0 * * 0 0 TGAGGTAGTAGGTTGTATAGT ???B;AA::9B:>7>C96;99 XM:i:3