自学miRNA-seq分析第五讲~miRNA表达量获取

拿到比对后的sam/bam文件之后,这只能算是level2的数据,一般我们给他人share我们的结果也是直接给表达矩阵的, miRNA分析跟mRNA分析类似,但是它的表达矩阵更好获取一点。如果是mRNA,我们一般会跟基因组来比较,而基因组就那24条参考染色体,想知道具体比对到了哪个基因,需要根据基因组注释文件来写程序提取表达量信息,现在比较流行的是htseq这个软件,我前面也写过教程如何安装和使用,这里就不啰嗦了。但是对于miRNA,因为我比对的就是那1881条前体miRNA序列,所以直接分析比对的sam/bam文件就可以知道每条参考miRNA序列的表达量了。 

## step6: counts the reads which mapping to each miRNA reference.
## we need to exclude unmapped as well as multiple-mapped  reads
## XS:i:<n> Alignment score for second-best alignment. Can be negative. Can be greater than 0 in --local mode
## NM:i:1   ## NM i Edit distance to the reference, including ambiguous bases but excluding clipping
#The following command exclude unmapped (-F 4) as well as multiple-mapped (grep -v “XS:”) reads
#samtools view -F 4 input.bam | grep -v "XS:" | wc -l
## 180466//1520320
ls *hairpin.sam  | while read id
do
samtools view  -SF 4 $id |perl -alne '{$h{$F[2]}++}END{print "$_\t$h{$_}" foreach sort keys %h }'  > ${id%%_*}.hairpin.counts
done
ls *mature.sam  | while read id
do
samtools view  -SF 4 $id |perl -alne '{$h{$F[2]}++}END{print "$_\t$h{$_}" foreach sort keys %h }'  > ${id%%_*}.mature.counts
done
上面的代码,是我自己写的脚本来算表达量,非常简单,因为我没有考虑细节,直接想得到各个样本测序数据的表达量而已。如果是比对到了参考基因组,就要根据miRNA的gff注释文件用htseq等软件来计算表达量啦。
得到了表达量,就可以跟文献来做比较啦:
### step7: compare the results with paper's
GSM1470353: control-CM, experiment1; Homo sapiens; miRNA-Seq   SRR1542714
GSM1470354: ET1-CM, experiment1; Homo sapiens; miRNA-Seq  SRR1542715
GSM1470355: control-CM, experiment2; Homo sapiens; miRNA-SeqSRR1542716
GSM1470356: ET1-CM, experiment2; Homo sapiens; miRNA-Seq SRR1542717
GSM1470357: control-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542718
GSM1470358: ET1-CM, experiment3; Homo sapiens; miRNA-Seq SRR1542719
### 下面我用R语言来检验一下,我得到的分析结果跟文章发表的结果的区别。
 a=read.table("bowtie_bam/SRR1542714.mature.counts")
 b=read.table("paper_results/GSM1470353_iPS_010313_Unstim_known_miRNA_counts.txt")
 plot(log(tmp[,2]),log(tmp[,3]))
 cor(tmp[,2],tmp[,3])
##[1] 0.8413439
相关性还不错,总算没有分析错咯。
这个代码是我自己根据文章的理解写出的,因为我本身不擅长miRNA数据分析,所以在进行alignment的时候参数选择可能并不是那么友好,如果有高手能指正就最好了,可以直接打我电话告诉我,或者发邮箱给我,邮箱用户名是jmzeng1314,是163邮箱。

 

Comments are closed.