microRNAs早就不再是科研热点,但毕竟还是遗留下来了不少数据,而且好歹是TCGA计划的多组学中的一环。在自己的研究增加miRNA的角度也是极好的, 通常大家有4个需求:
- 想知道自己感兴趣的一个或者多个miRNA有哪些靶基因
- 想知道自己感兴趣的一个或者多个基因由哪些miRNA调控
- 想知道自己感兴趣的一个或者多个miRNA跟哪些疾病或者药物相关
- 想知道自己感兴趣的一个或者多个miRNA是否调控自己感兴趣的一个或者多个基因
如果你也有上述需求,那么一个R包推荐给你,发表在Nucleic Acids Res. 2014 Sep的The multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations
关于R包的下载安装,我就不多说了:
options(BioC_mirror="https://mirrors.tuna.tsinghua.edu.cn/bioconductor/")
options("repos" = c(CRAN="http://mirrors.cloud.tencent.com/CRAN/"))
options("repos" = c(CRAN="https://mirrors.aliyun.com/CRAN/"))
options(download.file.method = 'libcurl')
options(url.method='libcurl')
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("multiMiR",ask = F,update = F)
安装并且加载multiMiR后,可以看到multiMiR的更新历史:
> library(multiMiR)
> db.ver = multimir_dbInfoVersions()
> db.ver[,1:3]
VERSION UPDATED RDA
1 2.3.0 2020-04-15 multimir_cutoffs_2.3.rda
2 2.2.0 2017-08-08 multimir_cutoffs_2.2.rda
3 2.1.0 2016-12-22 multimir_cutoffs_2.1.rda
4 2.0.0 2015-05-01 multimir_cutoffs.rda
这也就是我为什么推荐它的原因,首先当然是因为基于R,无需理会讨厌的网页工具,其次,它最近一次更新是2020-04-15 ,疫情如此严重,还坚持更新,值得鼓励!
当然,需要R编程基础从看得懂这个包的用法,有一个学习班推荐给大家:
- 生信爆款入门-全球听(买一得五)(第4期),你的生物信息学入门课
- 数据挖掘第2期(两天变三周,实力加量),医学生/临床医师首选技能提高课
miRWalk是12个网页工具的集合
如果你确实不喜欢R语言,也不想学,当然也可以使用网页工具哈:
一篇2018年6月的文章利用该miRWalk工具,选择被7个工具
预测到的MiRNA–mRNA相互作用关系作为最后的结果。文献标题是:FABP4 as a key determinant of metastatic potential of ovarian cancer,网页工具描述如下:
miRWalk2.0 not only documents miRNA binding sites within the complete sequence of a gene, but also combines this information with a comparison of binding sites resulting from 12 existing miRNA-target prediction programs (DIANA-microTv4.0, DIANA-microT-CDS, miRanda-rel2010, mirBridge, miRDB4.0, miRmap, miRNAMap, doRiNA i.e.,PicTar2, PITA, RNA22v2, RNAhybrid2.1 andTargetscan6.2) to build novel comparative platforms of binding sites for the promoter (4 prediction datasets), cds (5 prediction datasets), 5’- (5 prediction datasets) and 3’-UTR (13 prediction datasets) regions. It also documents experimentally verified miRNA-target interaction information collected via an automated text-mining search and data from existing resources (miRTarBase, PhenomiR,miR2Disease and HMDD) offer such information.
其实还有 miRSystem 整合了其他的预测软件: DIANA, miRanda, miRBridge, PicTar, PITA, rna22和TargetScan,包含TarBase和miRecords的验证数据。
当然了,各取所需,完成科研目标为主!
但是,我们要推荐的multiMiR,有14个数据库源哦。
来自于:http://multimir.org/,数据库的详细网址如下:
source_url
1 http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=microT_CDS/index
2 http://www.mirz.unibas.ch/miRNAtargetPredictionBulk.php
3 http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl
4 http://www.mir2disease.org
5 http://www.microrna.org/microrna/getDownloads.do
6 http://mirdb.org
7 http://mirecords.biolead.org/download.php
8 http://mirtarbase.mbc.nctu.edu.tw/php/download.php
9 http://www.pharmaco-mir.org/home/download_VERSE_db
10 http://mips.helmholtz-muenchen.de/phenomir/
11 http://dorina.mdc-berlin.de
12 http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html
13 http://carolina.imis.athena-innovation.gr/diana_tools/web/index.php?r=tarbasev8%2Findex
14 http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_61
收录了常见模式生物:人,小鼠,大鼠的miRNA数据。
> db.count
map_name human_count mouse_count rat_count total_count
1 diana_microt 7664602 3747171 0 11411773
2 elmmo 3959112 1449133 547191 5955436
3 microcosm 762987 534735 353378 1651100
4 mir2disease 2875 0 0 2875
5 miranda 5429955 2379881 247368 8057204
6 mirdb 1990425 1091263 199250 3280938
7 mirecords 2425 449 171 3045
8 mirtarbase 544588 50673 652 595913
9 pharmaco_mir 308 5 0 313
10 phenomir 15138 491 0 15629
11 pictar 404066 302236 0 706302
12 pita 7710936 5163153 0 12874089
13 tarbase 433048 209831 1307 644186
14 targetscan 13906497 10442093 0 24348590
从miRNA到mRNA
查询自己感兴趣的一个miRNA有哪些靶基因
注意,这个时候的miRNA的ID是有规则的哦,miRNA成熟体简写成miR,再根据其物种名称,及被发现的先后顺序加上阿拉伯数字,如hsa-miR-122;高度同源的miRNA在数字后机上英文小写字母(a,b,c,…),如hsa-miR-34a,hsa-miR-34b,hsa-miR-34c等;通常一个miRNA前体长度大约为70~80nt,很可能两个臂分别产生miRNA,则继续在名称之后加上-5p/-3p等,如hsa-miR-122-5p。
所以下面代码里面的例子miRNA的ID是 hsa-miR-18a-3p,你应该是明白了的!
# The default is to search validated interactions in human
example1 <- get_multimir(mirna = 'hsa-miR-18a-3p', summary = TRUE)
names(example1)
# Check which types of associations were returned
table(example1@data$type)
# Detailed information of the validated miRNA-target interaction
head(example1@data)
dim(example1@data)
# Which interactions are supported by Luciferase assay?
example1@data[grep("Luciferase", example1@data[, "experiment"]), ]
example1@summary[example1@summary[,"target_symbol"] == "KRAS",]
既然可以查询一个miRNA,当然是可以批量查询多个,示例代码如下,top_miRNAs是差异分析后挑选的miRNA的ID组成的向量:
multimir_results <- get_multimir(org = 'mmu',
mirna = top_miRNAs,
table = 'validated',
summary = TRUE)
从mRNA到miRNA
查询自己感兴趣的一个或者多个基因由哪些miRNA调控,代码分别如下:
example3 <- get_multimir(org = "mmu",
target = "Gnb1",
table = "predicted",
summary = TRUE,
predicted.cutoff = 35,
predicted.cutoff.type = "p",
predicted.site = "all")
names(example3)
table(example3@data$type)
head(example3@data)
head(example3@summary)
apply(example3@summary[, 6:13], 2, function(x) sum(x > 0))
example4 <- get_multimir(org = 'hsa',
target = c('AKT2', 'CERS6', 'S1PR3', 'SULF2'),
table = 'predicted',
summary = TRUE,
predicted.cutoff.type = 'n',
predicted.cutoff = 500000)
example4.counts <- addmargins(table(example4@summary[, 2:3]))
example4.counts <- example4.counts[-nrow(example4.counts), ]
example4.counts <- example4.counts[order(example4.counts[, 5], decreasing = TRUE), ]
head(example4.counts)
因为查询的数据集,虽然记录了miRNA和mRNA的关系,但有很多筛选阈值可以选择,就需要熟练掌握数据库源头。
从miRNA到疾病或者药物
主要是数据库记录:
example2 <- get_multimir(disease.drug = 'cisplatin', table = 'disease.drug')
names(example2)
nrow(example2@data)
table(example2@data$type)
head(example2@data)
miRNA集合是否调控mRNA集合
load(url("http://multimir.org/bladder.rda"))
## ----Example5_part2, eval=TRUE, echo=TRUE---------------------------------------------------------
# search all tables & top 10% predictions
example5 <- get_multimir(org = "hsa",
mirna = DE.miRNA.up,
target = DE.entrez.dn,
table = "all",
summary = TRUE,
predicted.cutoff.type = "p",
predicted.cutoff = 10,
use.tibble = TRUE)
table(example5@data$type)
result <- select(example5, keytype = "type", keys = "validated", columns = columns(example5))
unique_pairs <-
result[!duplicated(result[, c("mature_mirna_id", "target_entrez")]), ]
result
## ----Example5_part4, eval=TRUE, echo=TRUE---------------------------------------------------------
mykeytype <- "disease_drug"
mykeys <- keys(example5, keytype = mykeytype)
mykeys <- mykeys[grep("bladder", mykeys, ignore.case = TRUE)]
result <- select(example5, keytype = "disease_drug", keys = mykeys,
columns = columns(example5))
result
## ----Example5_part4_fortext, echo=FALSE, include=FALSE, eval=TRUE---------------------------------
unique_pairs <-
result[!duplicated(apply(result[, c("mature_mirna_id", "disease_drug")], 2,
tolower)), ]
一个示例
下面是使用edgeR包,对普通的转录组counts表达矩阵(miRNA)做差异分析,并且拿到感兴趣的miRNA基因集:
library(edgeR)
library(multiMiR)
# Load data
counts_file <- system.file("extdata", "counts_table.Rds", package = "multiMiR")
strains_file <- system.file("extdata", "strains_factor.Rds", package = "multiMiR")
counts_table <- readRDS(counts_file)
strains_factor <- readRDS(strains_file)
table(strains_factor)
# Standard edgeR differential expression analysis
design <- model.matrix(~ strains_factor)
# Using trended dispersions
dge <- DGEList(counts = counts_table)
dge <- calcNormFactors(dge)
dge$samples$strains <- strains_factor
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTrendedDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)
# Fit GLM model for strain effect
fit <- glmFit(dge, design)
lrt <- glmLRT(fit)
# Table of unadjusted p-values (PValue) and FDR values
p_val_DE_edgeR <- topTags(lrt, adjust.method = 'BH', n = Inf)
# Getting top differentially expressed miRNA's
top_miRNAs <- rownames(p_val_DE_edgeR$table)[1:10]
有了感兴趣的miRNA基因集,就可以查询它们的靶基因
library(multiMiR)
# Plug miRNA's into multiMiR and getting validated targets
multimir_results <- get_multimir(org = 'mmu',
mirna = top_miRNAs,
table = 'validated',
summary = TRUE)
head(multimir_results@data)
table(multimir_results@data$mature_mirna_id)
dim(multimir_results@data)
是不是非常方便,有了multiMiR包后!
文末友情宣传
强烈建议你推荐我们生信技能树给身边的博士后以及年轻生物学PI,帮助他们多一点数据认知,让科研更上一个台阶:
- 生信爆款入门-全球听(买一得五)(第4期),你的生物信息学入门课
- 数据挖掘第2期(两天变三周,实力加量),医学生/临床医师首选技能提高课
- 生信技能树的2019年终总结 ,你的生物信息学成长宝藏
- 2020学习主旋律,B站74小时免费教学视频为你领路,还等什么,看啊!!!