现代生物学所需要的现代统计学

看到了一本有意思的书籍:《现代生物学所需要的现代统计学》,名字是我自己翻译的。

主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识,恰好这个《Modern Statistics for Modern Biology》把二者涵盖了,在线阅读链接:https://www.huber.embl.de/msmb/index.html

全书还配套代码哦:

source("https://www.huber.embl.de/msmb/install_packages.R")

Data

Code

章节目录:

  • Home
  • Book supplements
  • Physical Copy
  • Introduction
  • 1 Generative Models for Discrete Data
  • 2 Statistical Modeling
  • 3 High Quality Graphics in R
  • 4 Mixture Models
  • 5 Clustering
  • 6 Testing
  • 7 Multivariate Analysis
  • 8 High-Throughput Count Data
  • 9 Multivariate methods for heterogeneous data
  • 10 Networks and Trees
  • 11 Image data
  • 12 Supervised Learning
  • 13 Design of High Throughput Experiments and their Analyses
  • Statistical Concordance
  • Acknowledgements
  • References

确实非常详细,图表代码丰富,比如第8节是高通量测序数据表达量矩阵处理:

  • Goals of this chapter
  • Some core concepts
  • Count data
  • Modeling count data
  • A basic analysis
  • Critique of default choices and possible modifications
  • Multi-factor designs and linear models
  • Generalized linear models
  • Two-factor analysis of the pasilla data
  • Further statistical concepts
  • Summary of this chapter
  • Further reading
  • Exercises

使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息:

fn = system.file("extdata", "pasilla_gene_counts.tsv",
 package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))

annotationFile = system.file("extdata",
 "pasilla_sample_annotation.csv",
 package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno

然后根据分组,构建好比较信息,使用DESeq2包如下所示代码即可差异分析 :

library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))

library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
 countData = counts,
 colData = pasillaSampleAnno[mt, ],
 design = ~ condition)
class(pasilla)

pasilla = DESeq(pasilla)

res = results(pasilla)
res[order(res$padj), ] %>% head

是不是超级方便啊!

生物学背景也可以看公开课

因为绝大部分转生物信息学工程师的小伙伴都是有至少4年的生物学背景,生物大分子,中心法则都没有问题,但是也有部分计算机背景学生转过来,会不停的问我该如何补充生物学背景,这里推荐慕课(https://www.icourse163.org/)的两个课程

ngs课程还有更多

不容错过的B站免费NGS数据处理视频课程,目前,已经组建了微信交流群的有下面这些:

Comments are closed.