十一 23

quantile normalization到底对数据做了什么?

提到normalization很多人都烦了,几十种方法,而对于芯片或者其它表达数据来说,最常见的莫过于quantile normalization啦。那么它到底对我们的表达数据做了什么呢?首先要么要清楚一个概念,表达矩阵的每一列都是一个样本,每一行都是一个基因或者探针,值就是表达量咯。quantile normalization 就是对每列单独进行排序,排好序的矩阵求平均值,得到平均值向量,然后根据原矩阵的排序情况替换对应的平均值,所以normalization之后的值只有平均值了。具体看下面的图: Continue reading

16

nature发表的统计学专题Statistics in biology

生物学里面,唯一还算有点技术含量,和有点门槛,就是生物统计了,而这也是绝大部分研究者的痛点,有能力的,可以看看nature上面关于统计学的专题讨论,而且主要是应用于自然科学的统计学讨论。

里面有几句统计学名言警句:
Statistics does not tell us whether we are right. It tells us the chances of being wrong.
统计学并不会告诉我们是否正确,而只是说明我们错误的可能性是多少。
Quality is often more important than quantity.
数据的质量远比数量要重要的多
The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.
Good experimental designs mitigate experimental error and the impact of factors not under study.
文章列表:
Research methods: Know when your numbers are significant
Scientific method: Statistical errors
Weak statistical standards implicated in scientific irreproducibility
The fickle P value generates irreproducible results
Vital statistics
Experimental biology: Sometimes Bayesian statistics are better
A call for transparent reporting to optimize the predictive value of preclinical research
Power failure: why small sample size undermines the reliability of neuroscience
Basic statistical analysis in genetic case-control studies
Erroneous analyses of interactions in neuroscience: a problem of significance
Analyzing 'omics data using hierarchical models
Advantages and pitfalls in the application of mixed-model association methods
Quality control and conduct of genome-wide association meta-analyses
Circular analysis in systems neuroscience: the dangers of double dipping
A solution to dependency: using multilevel analysis to accommodate nested data
How does multiple testing correction work?
What is Bayesian statistics?
What is a hidden Markov model?
下面的这些文章,其实就是我们正常课本里面统计学的知识点,但是放在nature杂志发表,就顿时高大上了好多
Points of significance: Importance of being uncertain
Points of Significance: Error bars
Points of significance: Significance, P values and t-tests
Points of significance: Power and sample size
Points of Significance: Visualizing samples with box plots
Points of significance: Comparing samples €”part I
Points of significance: Comparing samples part II
Points of significance:  Nonparametric tests
Points of significance: Designing comparative experiments
Points of significance: Analysis of variance and blocking
Points of Significance:  Replication
Points of Significance:  Nested designs
Points of Significance: Two-factor designs
Points of significance: Sources of variation
Points of Significance: Split plot design
Points of Significance: Bayes' theorem
Points of significance: Bayesian statistics
Points of Significance: Sampling distributions and the bootstrap
Points of Significance: Bayesian networks
A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.