看到了一本有意思的书籍:《现代生物学所需要的现代统计学》,名字是我自己翻译的。
主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识,恰好这个《Modern Statistics for Modern Biology》把二者涵盖了,在线阅读链接:https://www.huber.embl.de/msmb/index.html
全书还配套代码哦:
source("https://www.huber.embl.de/msmb/install_packages.R")
Data
Code
章节目录:
- Home
- Book supplements
- Physical Copy
- Introduction
- 1 Generative Models for Discrete Data
- 2 Statistical Modeling
- 3 High Quality Graphics in R
- 4 Mixture Models
- 5 Clustering
- 6 Testing
- 7 Multivariate Analysis
- 8 High-Throughput Count Data
- 9 Multivariate methods for heterogeneous data
- 10 Networks and Trees
- 11 Image data
- 12 Supervised Learning
- 13 Design of High Throughput Experiments and their Analyses
- Statistical Concordance
- Acknowledgements
- References
确实非常详细,图表代码丰富,比如第8节是高通量测序数据表达量矩阵处理:
- Goals of this chapter
- Some core concepts
- Count data
- Modeling count data
- A basic analysis
- Critique of default choices and possible modifications
- Multi-factor designs and linear models
- Generalized linear models
- Two-factor analysis of the pasilla data
- Further statistical concepts
- Summary of this chapter
- Further reading
- Exercises
使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息:
fn = system.file("extdata", "pasilla_gene_counts.tsv",
package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))
annotationFile = system.file("extdata",
"pasilla_sample_annotation.csv",
package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno
然后根据分组,构建好比较信息,使用DESeq2包如下所示代码即可差异分析 :
library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))
library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
countData = counts,
colData = pasillaSampleAnno[mt, ],
design = ~ condition)
class(pasilla)
pasilla = DESeq(pasilla)
res = results(pasilla)
res[order(res$padj), ] %>% head
是不是超级方便啊!
生物学背景也可以看公开课
因为绝大部分转生物信息学工程师的小伙伴都是有至少4年的生物学背景,生物大分子,中心法则都没有问题,但是也有部分计算机背景学生转过来,会不停的问我该如何补充生物学背景,这里推荐慕课(https://www.icourse163.org/)的两个课程
- 复旦大学的基因组学:https://www.icourse163.org/course/FUDAN-1002839009#/info
- 四川大学的细胞生物学:https://www.icourse163.org/course/SCU-46011
- 其它课程请自行搜索,按需学习,争取掌握生信基础100讲:https://mp.weixin.qq.com/s/Gr_0H4-GaTYkgUkbNHcMcg
ngs课程还有更多
不容错过的B站免费NGS数据处理视频课程,目前,已经组建了微信交流群的有下面这些: