library(org.Hs.eg.db )
library(org.Hs.eg.db )
This online tool currently uses CrossMap, which supports a limited number of formats (see our online documentation for details of the individual data formats listed below). CrossMap also discards metadata in files, so track definitions, etc, will be lost on conversion.
Important note: CrossMap converts WIG files to BedGraph internally for efficiency, and also outputs them in BedGraph format.
[perl]
suppressMessages(library(ggplot2))
suppressMessages(library(RMySQL))
con <- dbConnect(MySQL(), host="127.0.0.1", port=3306, user="root", password="11111111")
dbSendQuery(con, "USE test")
gene='SOX10'
#gene='DDX11L11'
if (T){
query=paste("select * from hg19_gtf where gene_type='protein_coding' and gene_name=",shQuote(gene),sep="")
structure=dbGetQuery(con,query)
tmp_min=min(c(structure$start,structure$end))
structure$new_start=structure$start-tmp_min
structure$new_end=structure$end-tmp_min
tmp_max=max(c(structure$new_start,structure$new_end))
num_transcripts=nrow(structure[structure$record=='transcript',])
tmp_color=rainbow(num_transcripts)
x=1:tmp_max;y=rep(num_transcripts,length(x))
#x=10000:17000;y=rep(num_transcripts,length(x))
plot(x,y,type = 'n',xlab='',ylab = '',ylim = c(0,num_transcripts+1))
title(main = gene,sub = paste("chr",tmp$chr,":",tmp$start,"-",tmp$end,sep=""))
j=0;
tmp_legend=c()
for (i in 1:nrow(structure)){
tmp=structure[i,]
if(tmp$record == 'transcript'){
j=j+1
tmp_legend=c(tmp_legend,paste("chr",tmp$chr,":",tmp$start,"-",tmp$end,sep=""))
}
if(tmp$record == 'exon') lines(c(tmp$new_start,tmp$new_end),c(j,j),col=tmp_color[j],lwd=4)
}
# legend('topleft',legend=tmp_legend,lty=1,lwd = 4,col = tmp_color);
}
[/perl]
现有的基因芯片种类不要太多了!
gpl organism bioc_package1 GPL32 Mus musculus mgu74a2 GPL33 Mus musculus mgu74b3 GPL34 Mus musculus mgu74c6 GPL74 Homo sapiens hcg1107 GPL75 Mus musculus mu11ksuba8 GPL76 Mus musculus mu11ksubb9 GPL77 Mus musculus mu19ksuba10 GPL78 Mus musculus mu19ksubb11 GPL79 Mus musculus mu19ksubc12 GPL80 Homo sapiens hu680013 GPL81 Mus musculus mgu74av214 GPL82 Mus musculus mgu74bv215 GPL83 Mus musculus mgu74cv216 GPL85 Rattus norvegicus rgu34a17 GPL86 Rattus norvegicus rgu34b18 GPL87 Rattus norvegicus rgu34c19 GPL88 Rattus norvegicus rnu3420 GPL89 Rattus norvegicus rtu3422 GPL91 Homo sapiens hgu95av223 GPL92 Homo sapiens hgu95b24 GPL93 Homo sapiens hgu95c25 GPL94 Homo sapiens hgu95d26 GPL95 Homo sapiens hgu95e27 GPL96 Homo sapiens hgu133a28 GPL97 Homo sapiens hgu133b29 GPL98 Homo sapiens hu35ksuba30 GPL99 Homo sapiens hu35ksubb31 GPL100 Homo sapiens hu35ksubc32 GPL101 Homo sapiens hu35ksubd36 GPL201 Homo sapiens hgfocus37 GPL339 Mus musculus moe430a38 GPL340 Mus musculus mouse430239 GPL341 Rattus norvegicus rae230a40 GPL342 Rattus norvegicus rae230b41 GPL570 Homo sapiens hgu133plus242 GPL571 Homo sapiens hgu133a243 GPL886 Homo sapiens hgug4111a44 GPL887 Homo sapiens hgug4110b45 GPL1261 Mus musculus mouse430a249 GPL1352 Homo sapiens u133x3p50 GPL1355 Rattus norvegicus rat230251 GPL1708 Homo sapiens hgug4112a54 GPL2891 Homo sapiens h20kcod55 GPL2898 Rattus norvegicus adme16cod60 GPL3921 Homo sapiens hthgu133a63 GPL4191 Homo sapiens h10kcod64 GPL5689 Homo sapiens hgug4100a65 GPL6097 Homo sapiens illuminaHumanv166 GPL6102 Homo sapiens illuminaHumanv267 GPL6244 Homo sapiens hugene10sttranscriptcluster68 GPL6947 Homo sapiens illuminaHumanv369 GPL8300 Homo sapiens hgu95av270 GPL8490 Homo sapiens IlluminaHumanMethylation27k71 GPL10558 Homo sapiens illuminaHumanv472 GPL11532 Homo sapiens hugene11sttranscriptcluster73 GPL13497 Homo sapiens HsAgilentDesign02665274 GPL13534 Homo sapiens IlluminaHumanMethylation450k75 GPL13667 Homo sapiens hgu21976 GPL15380 Homo sapiens GGHumanMethCancerPanelv177 GPL15396 Homo sapiens hthgu133b78 GPL17897 Homo sapiens hthgu133a
gpl_info=read.csv("GPL_info.csv",stringsAsFactors = F)### first download all of the annotation packages from bioconductorfor (i in 1:nrow(gpl_info)){print(i)platform=gpl_info[i,4]platform=gsub('^ ',"",platform) ##主要是因为我处理包的字符串前面有空格#platformDB='hgu95av2.db'platformDB=paste(platform,".db",sep="")if( platformDB %in% rownames(installed.packages()) == FALSE) {BiocInstaller::biocLite(platformDB)#source("http://bioconductor.org/biocLite.R");#biocLite(platformDB )}}
下载完了所有的包, 就可以进行批量导出芯片探针与gene的对应关系!
for (i in 1:nrow(gpl_info)){print(i)platform=gpl_info[i,4]platform=gsub('^ ',"",platform)#platformDB='hgu95av2.db'platformDB=paste(platform,".db",sep="")if( platformDB %in% rownames(installed.packages()) != FALSE) {library(platformDB,character.only = T)#tmp=paste('head(mappedkeys(',platform,'ENTREZID))',sep='')#eval(parse(text = tmp))###重点在这里,把字符串当做命令运行all_probe=eval(parse(text = paste('mappedkeys(',platform,'ENTREZID)',sep='')))EGID <- as.numeric(lookUp(all_probe, platformDB, "ENTREZID"))##自己把内容写出来即可}}
Hugo_Symbol | HUGO symbol for the gene | TP53 |
Protein_Change | Amino acid change | V600E |
Pfam provides an online tool to not only generate the domain information in JSON format, but to draw the lollipop diagram using javascript as well. They have more information here: http://pfam.xfam.org/help#tabview=tab9
IMHO, not as pretty as cBioPortal's but it gets you close to a solution.
EDIT / SHAMELESS PLUG: After seeing the data available and how easy it'd be, I made my own quick tool to fetch the data and draw the diagram for me in a style similar to cBioPortal - feel free to fork it and add features: https://github.com/pbnjay/lollipops
Example output (w/ labels per the comments)
We found ourselves in the same need, we wanted such a plot (JavaScript). Thus, I add our solution, Mutations Needle Plot. The library creates an SVG image (with D3), which then may be downloaded.
You will npm in order to be able to install & run the library.
Examples may be found in the snippets folder or also the index.html - The one displayed here below
我现在还不是很确定这个方法,只是试一试,欢迎与我交流对该方法的讨论!
Wang H, Sun Q, Zhao W, et al. Individual-level analysis of differential expression of genes and pathways for personalized medicine[J]. Bioinformatics, 2014: btu522.
他们把它写成了一个R包,可以下载使用,但是必须用R2.15.2版本,我用了一下,不好用!
We can download the R code for in http://bioinformatics.oxfordjournals.org/content/31/1/62/suppl/DC1
他们这个程序真心不好用,但是很容易看懂算法,可以自己用R语言写一个来实现同样的过程!
报错如下:ERROR MESSAGE: SAM/BAM file input.marked.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups !
A general framework for estimating the relative pathogenicity of human genetic variants.
Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892.
PubMed PMID: 24487276.
Your search returned 207 results in 9 categories with the following search parameters:
BIND | the biomolecular interaction network database | died link |
DIP | the database of interacting proteins | http://dip.doe-mbi.ucla.edu/ |
MINT | the molecular interaction database | http://mint.bio.uniroma2.it/mint/ |
STRING | Search Tool for the Retrieval of Interacting Genes/Proteins | http://string-db.org/ |
HPRO | Human protein reference database | http://www.hprd.org/ |
BioGRID | The Biological General Repository for Interaction Datasets | http://thebiogrid.org/ |
之前用过有界面的那种,那样非常方便,只需要做好数据即可,但是如果有非常多的数据,每次都要点击文件,点击下一步,也很烦,不过,,它既然是java软件,就可以以命令行的形式来玩转它!
直接在官网下载java版本软件即可:http://software.broadinstitute.org/gsea/downloads.jsp
需要下载gmt文件,自己制作gct和cls文件,或者直接下载测试文件p53
hgu95av2的芯片数据,只有一万多探针,所以很快就可以出结果
CP: Canonical pathways (browse 1330 gene sets) |
Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. details | Download GMT Files original identifiers gene symbols entrez genes ids |
---|---|---|
CP:BIOCARTA: BioCarta gene sets (browse 217 gene sets) |
Gene sets derived from the BioCarta pathway database (http://www.biocarta.com/genes/index.asp). | Download GMT Files original identifiers gene symbols entrez genes ids |
CP:KEGG: KEGG gene sets (browse 186 gene sets) |
Gene sets derived from the KEGG pathway database (http://www.genome.jp/kegg/pathway.html). | Download GMT Files original identifiers gene symbols entrez genes ids |
CP:REACTOME: Reactome gene sets (browse 674 gene sets) |
Gene sets derived from the Reactome pathway database (http://www.reactome.org/). | Download GMT Files original identifiers gene symbols entrez genes ids |
然后我看了一下域名,发现其实很有规律的
linux/ | 23-Jan-2008 19:47 | - | ||
macos/ | 19-Apr-2005 09:45 | - | ||
macosx/ | 12-Dec-2015 09:04 | - | ||
windows/ | 24-Feb-2012 18:41 | - |
debian/ | 15-Dec-2015 02:06 | - | ||
redhat/ | 27-Jul-2014 21:12 | - | ||
suse/ | 16-Feb-2012 15:09 | - | ||
ubuntu/ | 06-Jan-2016 04:05 | - |
precise/ | 06-Jan-2016 04:03 | - | ||
trusty/ | 06-Jan-2016 04:04 | - | ||
vivid/ | 06-Jan-2016 04:04 | - | ||
wily/ | 06-Jan-2016 04:05 | - |
所以如果,大家是想在linux系统里面安装旧版本的R,建议大家直接下载c源码,直接三部曲就可以安装啦!
这里的拷贝数变异检测芯片指的是Affymetrix Genome-Wide Human SNP Array 6.0
jmzeng@ubuntu:/home/jmzeng/bio-soft/picnic/c_code/celConverter$ java -jar CelFileConverter.jar -m Snp6FeatureMappings.csv -c ../cdf/GenomeWideSNP_6.Full.cdf -s ../celFiles/ -t ../result/
Loading CDF file: ../cdf/GenomeWideSNP_6.Full.cdf
Loading feature mapping file: Snp6FeatureMappings.csv
Processing CEL file: /home/jmzeng/bio-soft/picnic/c_code/celConverter/../celFiles/GSM1949207.CEL
Created feature intensity file: /home/jmzeng/bio-soft/picnic/c_code/celConverter/../result/GSM1949207.feature_intensity
Processing CEL file: /home/jmzeng/bio-soft/picnic/c_code/celConverter/../celFiles/GSM887898.CEL
Created feature intensity file: /home/jmzeng/bio-soft/picnic/c_code/celConverter/../result/GSM887898.feature_intensity
HMM('GSM1949207.feature_intensity','info\','result\output\','result\',10,0,2.0221,0.40997)
产品特点:
1.涵盖超过1,800,000个遗传变异标志物:包括超过906,600个SNP和超过946,000个用于检测拷贝数变化(CNV,Copy Number Variation)的探针;
2.SNP和CNV两种探针高密度且均匀地分布在整个基因组,不仅可以用于SNP基因精确分型,还可用于拷贝数变异CNV的研究;
3.744,000个探针平均分配到整个基因组上,用来发现未知的拷贝数变异区域;
4.可用于Copy-neutral LOH/UPD检测,亲子鉴定,纯合性分析、血缘关系鉴定、遗传病或其它疾病的研究。
ACC BLCA BRCA CESC COAD COADREAD DLBC ESCA GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD LUSC OV PAAD PANCANCER PANCAN8 PANCAN12 PRAD READ SARC SKCM STAD THCA UCEC UCS
Title | Alternatively processed and compiled RNA-Sequencing and clinical data for thousands of samples from The Cancer Genome Atlas |
Organism | Homo sapiens |
Experiment type | Expression profiling by high throughput sequencing |
Summary | We reprocessed RNA-Seq data for 9264 tumor samples and 741 normal samples across 24 cancer types from The Cancer Genome Atlas with "Rsubread". Rsubread is an open source R package that has shown high concordance with other existing methods of alignment and summarization, but is simple to use and takes significantly less time to process data. Additionally, we provide clinical variables publicly available as of May 20, 2015 for the tumor samples where the TCGA ids are matched. |
官网:http://mutationassessor.org/
hg19,13,32912555,G,T BRCA2
hg18,7,55178574,G,A GBM
7,55178574,G,A GBM
or in protein space: <protein ID> <variant> <text>, where <protein ID> can be :
1. Uniprot protein accession (i.e. EGFR_HUMAN)
2. NCBI Refseq protein ID (i.e. NP_005219)
examples:
EGFR_HUMAN R521K
EGFR_HUMAN R98Q Polymorphism
EGFR_HUMAN G719D disease
NP_000537 G356A
NP_000537 G360A dbSNP:rs35993958
NP_000537 S46A Abolishes phosphorylation
ID types can be mixed in one list in any way.