现代生物学所需要的现代统计学

看到了一本有意思的书籍：《现代生物学所需要的现代统计学》，名字是我自己翻译的。

主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识，恰好这个《Modern Statistics for Modern Biology》把二者涵盖了，在线阅读链接：https://www.huber.embl.de/msmb/index.html

全书还配套代码哦：

source("https://www.huber.embl.de/msmb/install_packages.R")

Data

Zipped data directory，压缩包自己下载，https://www.huber.embl.de/msmb/data.tar.gz

Code

Rfiles folder，链接是：https://www.huber.embl.de/msmb/code/

章节目录：

Home
Book supplements
Physical Copy
Introduction
1 Generative Models for Discrete Data
2 Statistical Modeling
3 High Quality Graphics in R
4 Mixture Models
5 Clustering
6 Testing
7 Multivariate Analysis
8 High-Throughput Count Data
9 Multivariate methods for heterogeneous data
10 Networks and Trees
11 Image data
12 Supervised Learning
13 Design of High Throughput Experiments and their Analyses
Statistical Concordance
Acknowledgements
References

确实非常详细，图表代码丰富，比如第8节是高通量测序数据表达量矩阵处理：

Goals of this chapter
Some core concepts
Count data
Modeling count data
A basic analysis
Critique of default choices and possible modifications
Multi-factor designs and linear models
Generalized linear models
Two-factor analysis of the pasilla data
Further statistical concepts
Summary of this chapter
Further reading
Exercises

使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息：

fn = system.file("extdata", "pasilla_gene_counts.tsv",
 package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))

annotationFile = system.file("extdata",
 "pasilla_sample_annotation.csv",
 package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno

然后根据分组，构建好比较信息，使用DESeq2包如下所示代码即可差异分析：

library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))

library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
 countData = counts,
 colData = pasillaSampleAnno[mt, ],
 design = ~ condition)
class(pasilla)

pasilla = DESeq(pasilla)

res = results(pasilla)
res[order(res$padj), ] %>% head

是不是超级方便啊！

生物学背景也可以看公开课

因为绝大部分转生物信息学工程师的小伙伴都是有至少4年的生物学背景，生物大分子，中心法则都没有问题，但是也有部分计算机背景学生转过来，会不停的问我该如何补充生物学背景，这里推荐慕课(https://www.icourse163.org/)的两个课程

复旦大学的基因组学：https://www.icourse163.org/course/FUDAN-1002839009#/info
四川大学的细胞生物学：https://www.icourse163.org/course/SCU-46011
其它课程请自行搜索，按需学习，争取掌握生信基础100讲：https://mp.weixin.qq.com/s/Gr_0H4-GaTYkgUkbNHcMcg

ngs课程还有更多

不容错过的B站免费NGS数据处理视频课程，目前，已经组建了微信交流群的有下面这些：

一	二	三	四	五	六	日
« 九
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee