TPM格式搜索错了，好尴尬

网上的答案经常不靠谱

通常情况下我会使用 featureCounts 得到表达矩阵是 raw counts, 但总是有人需要我转换成各种形式，比如 RPKM, FPKM and TPM

概念见： https://statquest.org/2015/07/09/rpkm-fpkm-and-tpm-clearly-explained/

想偷一下懒，就搜索到了这个答案： http://ny-shao.name/2016/11/18/a-short-script-to-calculate-rpkm-and-tpm-from-featurecounts-output.html

## functions for rpkm and tpm
## from https://gist.github.com/slowkow/c6ab0348747f86e2748b#file-counts_to_tpm-r-L44
## from https://www.biostars.org/p/171766/
rpkm <- function(counts, lengths) {
  rate <- counts / lengths
  rate / sum(counts) * 1e9
}

tpm <- function(counts, lengths) {
  rate <- counts / lengths
  rate / sum(rate) * 1e6
}

朋友提醒我，其实错了，因为很明显 colSums(exprSet_tpm) 并不是一百万。

其实我老早就写过TPM公式，就是RPKM的百分比的百万倍扩大值，所以还是自己动手重新写了代码。

rpkm <- function(counts, lengths) {
  rate <- counts / lengths
  rate / sum(counts) * 1e9
}

exprSet_rpkm=rpkm(exprSet,len) 
exprSet_tpm=1e6*exprSet_rpkm/colSums(exprSet_rpkm)

不过，比较奇怪的是这个时候 colSums(exprSet_tpm) 是接近一百万，而不是精确的一百万，我还没有想清楚具体缘由，是不是R的计算小数点问题。

有关于的讨论： http://blog.nextgenetics.net/?e=51#body-anchor

一	二	三	四	五	六	日
« 九
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

TPM格式搜索错了，好尴尬