十二 01

我的生物信息学视频上线啦

Posted on 2015年12月1日 by ulwvfje

虽然优酷比较坑，但是他的受众多，大家可以去http://i.youku.com/trainee 上面找到我的全部视频，免费观看！

视频列表是http://www.bio-info-trainee.com/tmp/tutorial/video_list.html ，能顺利点击的链接代表视频录制完毕。

这次视频课程的大纲是http://www.bio-info-trainee.com/tmp/tutorial/syllabus.htm，但是很有可能会修改！

声明
大家好，欢迎观看由生信菜鸟团举办的生物信息学公益视频课程，我是主讲人Jimmy。
本课程仅面向那些生物出身却想转生物信息学数据分析的同学，其它人均不在考虑范围。
本课程在中国大陆录制，会尽量遵循中国大陆法律，如有法律问题，请联系我的律师，谢谢！
我会尽量保证视频中知识点的准确无误，如有任何讲解不当之处，欢迎批评指正！
发邮件联系我（jmzeng1314@outlook.com）即可，没什么必要不要QQ或者微信找我。

还有，本人不差钱，所以视频均免费的，录制完毕后我会放出百度云共享，现在还在测试阶段。

请不要无缘无故的批评我，我不会服务任何人，所以不可能有人是我的顾客，我不需要对你好，爱看不看！

十一 15

hapmap计划资源收集

Posted on 2015年11月15日 by ulwvfje

官网是：http://hapmap.ncbi.nlm.nih.gov/index.html.en

里面有教程：http://hapmap.ncbi.nlm.nih.gov/tutorials.html.en

所有的数据都放在ncbi上面：ftp://ftp.ncbi.nlm.nih.gov/hapmap/

现在一般用这个计划的数据主要是拿自己得到的突变数据来跟这个hapmap计划的人种突变数据对比。

有芯片数据，也有WES和WGS数据，随着时间的推进，平台也在更新：

Jul 07 2009 00:00    Directory affy100k
Mar 05 2010 00:00    Directory affy500k
Jun 02 2010 00:00    Directory hapmap3_affy6.0

当然，数据也在更新

Jul 07 2009 00:00    Directory 2005-03_phaseI
Dec 03 2009 00:00    Directory 2005-11_phaseII
Jul 07 2009 00:00    Directory 2007-03
Jul 07 2009 00:00    Directory 2008-03
Jul 07 2009 00:00    Directory 2008-07_phaseIII
Jul 07 2009 00:00    Directory 2008-10_phaseII
Jul 07 2009 00:00    Directory 2009-01_phaseIII
Jul 07 2009 00:00    Directory 2009-02_phaseII+III
Aug 18 2010 00:00    Directory 2010-05_phaseIII
Sep 19 2010 00:00    Directory 2010-08_phaseII+III

数据都被整合好了：

Bulk data
- Genotypes: Individual genotype data submitted to the DCC to date. Phase 3 data is available in PLINK format and HapMap format.
- Frequencies: Allele & genotype frequencies compiled from genotyping data submitted to the DCC to date. These have also been submitted to dbSNP and should be available in the next dbSNP build.
- LD Data: Linkage disequilibrium properties D', LOD , R² compiled from the genotype data to date
- Phasing Data: Phasing data generated using the PHASE software, compiled from the genotype data to date.
- Allocated SNPs: dbSNP reference SNP clusters that have been picked and prioritized for genotyping according to several criteria (see info on how SNPs were selected). The file 00README contains per-chromosome SNP counts and further details.
- CNV Genotypes: CNV data from HapMap3 samples.
- Recombination rates and Hotspots: Recombination rates and hotspots compiled from the genotyping data.
- SNP assays: Details about assays submitted to the DCC to date. PCR primers, extension probes etc., specific to each genotyping platform.
- Perlegen amplicons: Details for mapping Perlegen amplicons to HapMap assayLSID. For primer sequences, see Perlegen's Long Range PCR Amplicon data.
- Raw data: Raw signal intensity data from HapMap genotypes. Currently includes data from Affymetrix GeneChip 100k and 500k Mapping Arrays.
- Inferred genotypes: Genotypes inferred using the method of Burdick et al. Nat Genet 38:1002-4.
- Mitochondrial and chrY haplogroups: Classification of phase I HapMap samples into mtDNA and chrY haplogroups. The distribution shown in Table 4 of the HapMap phase I paper (Nat Genet 38:1002-4) corresponds to unrelated parents in each one of the populations analyzed.

同时也发了很多篇文章：

The International HapMap Consortium. Integrating common and rare genetic variation in diverse human populations.
Nature 467, 52-58. 2010. [Abstract] [PDF] [Supplementary information]
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs.
Nature 449, 851-861. 2007. [Abstract] [PDF] [Supplementary information]
The International HapMap Consortium. A Haplotype Map of the Human Genome.
Nature 437, 1299-1320. 2005. [Abstract] [PDF] [Supplementary information]
The International HapMap Consortium. The International HapMap Project.
Nature 426, 789-796. 2003. [Abstract] [PDF] [Supplementary information]
The International HapMap Consortium. Integrating Ethics and Science in the International HapMap Project.
Nature Reviews Genetics 5, 467 -475. 2004. [Abstract] [PDF]
Thorisson, G.A., Smith, A.V., Krishnan, L., and Stein, L.D. The International HapMap Project Web site.
Genome Research,15:1591-1593. 2005. [Abstract] [PDF]

十 23

生物信息小白如何自学编程

Posted on 2015年10月23日 by ulwvfje

这本来是我在知乎上面看到的问题，所以就抽空回答了一下：http://www.zhihu.com/question/36701137/answer/68928111

首先，你懂得想去看源码，这是一个很好的兆头，一些非常正规的源码的确是编程进阶的的捷径，毕竟我们大部分人都不可能得到别人的手把手指导，所以只能靠自己的悟性了。

我就以我自己的经历来回答这个问题吧，我作为一个纯生物出身的小白，现在编程技术应该还算可以了！

首先，不管是哪个语言，perl,python,R,matlab都好，它们都有一堆的基础书籍，你必须以囫囵吞枣的心态看完一两本书（书没有好坏，别要我给你推荐书名），必须看完，了解编程基础。

接下来的步骤最重要，就是实践，不停的实践，在实践中运用编程技术，这样是学的最快的，不然你看再多的书也只是一个概念。

我这里重点推荐一个工具集，它实现了很多生物信息学需要的常用操作，网址是：Bioinformatics Tools
包含以下64中工具，而且网页也很清楚的描述了它们的功能，其实非常简单，但是这样写程序非常有效。
"Combines multiple FASTA entries into a single sequence."
"Returns the entire sequence contained in an EMBL file in FASTA format."
"Parses the feature table of an EMBL file and returns the feature sequences."
"Parses the feature table of an EMBL file and returns the protein translations."
"Removes non-DNA characters from text."
"Removes non-protein characters from text."
"Returns the entire sequence contained in a GenBank file in FASTA format."
"Parses the feature table of a GenBank file and returns the feature sequences."
"Parses the feature table of a GenBank file and returns the protein translations."
"Converts single letter amino acid codes to three letter codes."
"Reads a list of positions and ranges and returns those parts of a DNA sequence."
"Reads a list of positions and ranges and returns those parts of a protein sequence."
"Determines the reverse-complement, reverse, or complement of the sequence you enter."
"Separates bases according to codon position."
"Converts a FASTA sequence into multiple sequences."
"Converts three letter amino acid codes to one letter codes."
"Returns DNA sequence segments specified by a position and window size."
"Returns protein sequence segments specified by a position and window size."
"Plots codon frequency (according to the codon table you enter) for each codon in a DNA sequence."
"Returns a standard codon usage table."
"Returns a list of potential CpG islands."
"Calculates the molecular weight of DNA sequences."
"Returns positions of the patterns you enter."
"Returns basic sequence statistics."
"Returns sequences that are identical or similar to a query sequence."
"Returns sequences that are identical or similar to a query sequence."
"Accepts aligned sequences in FASTA format and calculates the identity and similarity of each sequence pair."
"Can be used to predict a DNA sequence in another species using a protein sequence alignment."
"Finds DNA sequences that can easily be converted to a restriction site."
"Determines the positions of open reading frames."
"Returns the optimal global alignment for two coding DNA sequences."
"Returns the optimal global alignment for two DNA sequences."
"Returns the optimal global alignment for two protein sequences."
"Returns a report describing PCR primer properties"
"Generates PCR products from a template and two primer sequences."
"Returns the grand average of hydropathy value of protein sequences."
"Returns the predicted isoelectric point of protein sequences."
"Calculates the molecular weight of protein sequences."
"Returns positions of the patterns you enter."
"Returns basic sequence statistics."
"Converts the sequence you enter into restriction fragments."
"Returns the number and positions of restriction sites."
"Can be used to convert protein into DNA."
"Returns the translation in the reading frame you specify."
"Colors a sequence alignment based on sequence conservation."
"Colors a protein alignment based on biochemical properties of residues."
"Numbers and groups DNA according to your specifications."
"Numbers and groups amino acids according to your specifications."
"Shows PCR primer annealing sites, translations, and restriction sites."
"Shows restriction sites and protein translations."
"Shows protein translations."
"Introduces random mutations into DNA sequences."
"Introduces random mutations into protein sequences."
"Generates a random coding sequence of the length you specify."
"Generates a random DNA sequence of the length you specify."
"Replaces regions of the DNA sequences you enter with random bases."
"Generates a random protein sequence of the length you specify."
"Replaces regions of the protein sequences you enter with random residues."
"Samples bases from a DNA sequence with replacement."
"Samples residues from a protein sequence with replacement."
"Randomly shuffles the DNA sequences you enter."
"Randomly shuffles the protein sequences you enter."
"IUPAC codes for DNA and protein."
"The genetic codes used in the Sequence Manipulation Suite."
当你实现完了这些需求，你不仅仅学会了编程，而且是学会了编程该如何应用在生物信息学里面！
用perl,python,R,matlab中的任何一种都可以实现，它们没有任何区别的，别纠结语言的问题。
不推荐初学者看源代码，因为源代码太正规了，定义变量就几十行代码了，再定义函数又是几百行代码，而真正学生物信息学的压根写代码都不超过五十行的，比如我上面提到那64个生物数据处理需求，一般就七八行代码就可以（在perl里面）
不信你可以看看这个github里面托管的代码：trinityrnaseq/util/misc at master · trinityrnaseq/trinityrnaseq · GitHub
里面有很多perl代码，都是实现各种数据转换的，写的非常正规，甚至能把一行代码就能解决的问题写成几百甚至上千行，除非你想把自己的代码拿去发文章或者出售，否则正常的生物信息学研究根本用不着！
当然，回到你最初的问题，哪里能找到源码呢？
首先，你可以去图书馆看一堆书籍，它们都会有光盘，下载既有视频又有源码，或者书上一般会说源码在哪里下载，比如这个pleac/include/perl at master · pleac/pleac · GitHub
然后，你可以找一大堆的生物信息学软件，它们一般都托管在github上面，这个链接里面有三百多个生物信息学转录组领域的软件：List of RNA-Seq bioinformatics tools
这个链接有几百个生物信息学里面做alignment的软件：
甚至连常见的生物信息学数据库也有自己的源码包：例如NCBI，ensembl，UCSC
下面就是ENSEMBL数据库的：NGS数据比对工具持续收集
 （记住，这些软件都是人家发表文章的，非常难，你一辈子能搞定一个就很了不起了，比如我，就搞了一下bowtie，也是一知半解的）
分享了所有的代码，实在是太方便了：Ensembl Project · GitHub
可以跟着这些代码学习编程：Ensembl/ensembl-pipeline · GitHub
它的官网的帮助文档也特别详细：Help & Documentation
你现在还缺资料吗？

十 20

一个MIT的博士要离开学术圈，结果引发了上千人的热烈讨论（上）

Posted on 2015年10月20日 by ulwvfje

看到了一篇好文，所以就抽空翻译了一下！比较搞笑的是这个博客在国内居然被墙了，所以我就只贴出我的译文吧。

再见吧，我的学术生涯！

在过去的12年间，我一直从事着科学研究和教学工作，而且我也乐于其中。然而，就在几个星期以前，我辞掉了MIT的博士后职位，放弃了长久以来追寻的学术梦想。随后，我觉得非常的自由快乐，而这对于美国的生命科学研究来说，的确是一非常不好的消息。

Michael Eisen，我的联合导师之一，他毕业于伯克利大学，最近在博客中写道：这是一个做科学研究非常好的时代，但对科学家来说确实一个非常烂的时代。几个月前，我与我的另一个联合导师Jasper Rine讨论了NIH研究基金会的资助风波。Jasper说道：除非NIH立刻醒悟，基金的管理方式需要重要的改变，否则，我们这一代的科学家命运将非常坎坷。而我就是这些命运坎坷的科学家的一员，所以我出局了。

2001年我大学毕业之后，我拒绝了一个高薪的程序员职位。相反，我选择了在著名的冷泉港实验室做生物信息学，尽管薪水要低很多。我个人是非常兴奋能有这个机会用各种计算技术来做生物学研究的。两年后，我如愿以偿的进入了该领域的研究生队伍，方向是分子生物学，与此同时，我的薪水在接下来的六年间都只有刚开始的一半了。而在MIT做博士后的期间，我的薪水也没能回到十年前我作为一个初级程序员的时候，尽管那时的我技能欠佳，也没有什么拿得出手的专业领域知识。在商业领域中，个人报酬也当与他掌握的技能以及知识水平相当，但是，在学术领域，它们二者之间的关联度大打折扣。

幸运的是，我的妻子一直都很支持我对科学研究的热爱，从2006年起，她就在做物理系的助教来平衡我们的财政支持。她工作的报酬还行，所以我们能负担的起在剑桥的房租和日常开销。而月内我的另一个孩子即将来到这个世上，我还有一个孩子在托儿所，所以我必须得改变自己，来改善我们的财务状况，做一个居家好男人总比在MIT做一个日夜忙碌的博士后要强。

早在我还是一名研究生的时候，我就意识到了追求学术生涯的种种弊端。我接受了微博的薪水，忍受了换学校单位的不稳定性以及教授们那近乎疯狂的工作量。我也接受了变幻莫测的天气，在熬过来了十多年的研究生和博士后的日子，我本可以顺理成章的拿到教职。我甚至也能接受，在追寻着光荣而神圣的目标的过程中，五年后有可能会被拒绝，从而不得不再次搬迁去另一个学校再寻找教职。我也能想象到，即使拿到了教职，我也不得不投入我的全部身心来为我的课题组争取研究基金。我能看到所有的一切我将为我追求所爱而付出的代价。

然而，本来应该是五年之后选择作为一个教授的烦恼却提取让我害怕了，因为拿到NIH的资助的不确定性太大了。没什么好说的了，一想到我所提交的所有研究基金申请书都会被打道回府，我就恐惧万分。我本应该靠着研究基金来提升自己，但是现在却看起来更糟糕了。就我所认识的十多名在近两三年拿到教授职位的科学家，他们没有一个人拿到了NIH的巨大的资金支持，他们得到的只有拒绝，再拒绝，进一步的拒绝。其中有一个非常年轻，且富有才华的教授，申请基金已经高达13次了，但是也是一直被拒绝，尽管她的申请书写的非常棒，而且立题也非常新颖。因此，她马上就要失去她的实验室了，因为她的启动资金即将耗尽，而这一切对她来说都犹如噩梦般，甚至需要安眠药帮助度过。这一切，又如何能令我不害怕呢？

在青少年时期，我就一直认为工作应该像度过周末一样愉快，而我也一直痴迷于此。在过去的十二年中，我一直试图在学术领域寻找这一点，而且认为只有学术这一条路能达到这一点。幸运的是，一年前，我与一个生命科学家合作开发一个开放的，实时更新的中央资料库，我非常享受开发过程的每一个步骤，而且我也很确定，在公司也能得到在科学界能得到的那种全心全意投入的感觉。

我现在还不确定科学家们是否会用到我们所创造的产品，也不知道我在公司能否就有充足资金来追寻我的梦想。一个星期前的辞职，我的确是冒了很大的风险。风险是很大，但这并不是疯狂的行为。真正疯狂的是按部就班的执着于学术圈。我可以这样说，通过这一年的创造各种科学性的产品，我比以前更接近于教授了。

我也明白，很多读者都会认为我是一种吃不到葡萄就说葡萄酸的心态，或者认为我对学术的渴望并不是想象中的那么强烈，我真的企图是变得富有。如果这也是你所认为的，那么你其实是抱着要科学家什么都不去想，只安心的做一个简简单单的教授的想法。我的确是热爱过我所从事的研究教学工作，我也好想念那些美好的日子，尽管想起来很受伤。但我也爱我的妻子，如果她像学术界对待科学家那样对待我，我早就离开她了。

原文地址：http://anothersb.blogspot.com/2014/02/goodbye-academia.html

我又翻墙把原文拷贝出来了，如下：

Goodbye Academia

I have enjoyed research and teaching for the last twelve years. Yet, I have resigned from my postdoctoral position at MIT a week ago, giving up on the dream of an academic position. I feel liberated and happy, and this is a very bad sign for the future of life sciences in the United States.

Michael Eisen, my co-advisor from graduate school at Berkeley recently wrote that it is a great time to do science but a terrible time to be a scientist. A few months ago I was discussing with my other co-adviser Jasper Rine the crisis in NIH research funding awards (better known as "lottery"). Jasper said that unless NIH wakes up and there is a major restructuring, we will lose an entire generation of scientists. I am a member of this generation, and I am out.

In 2001, about to graduate from college, I turned down a programming position at a hedge fund. Instead, I chose to do bioinformatics at Cold Spring Harbor Laboratory for a much lower salary. I was excited about the possibilities of doing biological research using computational tools. Two years later, I enthusiastically entered graduate school in molecular biology, with my salary dropping by half for the next six years. As a postdoctoral researcher at MIT, I am not even back to earning what I did ten years ago as a junior programmer with no skills or domain-specific knowledge. In a commercial setting, my compensation would have kept pace with my knowledge and skills, but in academia, there seems to be a complete decoupling of the two.

Luckily, my wife has always been supportive of my passion for science and balanced my foolhardiness with a practical job as a physician’s assistant since 2006. She is well compensated, allowing us to pay off our loans and afford the monthly expenses in Cambridge. With a daughter in daycare and another child due in a month, we would certainly be in a better financial shape with me as a stay-at-home dad than a postdoctoral scientist at MIT.

Science has also meant wrenching moves across the country. In 2003, we moved to California for me to begin my graduate studies. We both love New York, and my wife was devastated to leave her family and friends. In 2009, after many tearful discussions, she agreed to move to Boston from California for my postdoc. The next move for a professor position would surely require moving to yet another new place in the country.

As a graduate student, I was well aware of all of the negatives of an academic career. I accepted the miniscule pay, the inability to choose where to live, and the insane workloads of professors. I accepted the uncertainty of whether, after 10-12 years as a graduate student and postdoc, I would actually get a job as a professor. I accepted that even after attaining this lofty goal, five years later, I could be denied tenure and would have to move to another university or go into industry. I accepted that even with tenure, I would have to worry my entire life about securing research funding for the lab. I saw all of these as the price to pay for doing something that I love.

However, one aspect of being a professor has been terrifying me for over five years now – the uncertainty of getting funding from NIH. No let me rephrase that. What is terrifying is the near-certainty that any grant I submit would be rejected. I have been waiting for the funding situation to improve, but it seems to only be getting worse. I personally know about ten scientists who have become professors in the last 3-4 years. Not a single one of them has been able to get a grant proposal funded; just rejection, after rejection, after rejection. One of these is a brilliant young professor who has applied for grants thirteen times and has been rejected consistently, despite glowing reviews and high marks for innovation. She is on the brink of losing her lab as her startup funds are running out and the prospect of this has literally led to sleepless nights and the need for sleeping pills. How can this not terrify me?

I have been obsessed since my teens with the idea that work should be something one desires to come back to after a weekend. For the last twelve years, being an academic was the only path I saw toward this. Fortunately, a year ago, I co-founded a startup to create an open, up-to-date, central protocol repository for life scientists. I have enjoyed every step of getting ZappyLab going, and I am certain that the company will give me the feeling that I still get from science - wanting to go into work every day.

I don’t know yet if scientists will use what we are building. I don’t know if we will be able to raise the capital needed to build what I dream of building. By resigning from my postdoc a week ago, I have done something very risky. Risky, but not crazy. What seems crazy is aiming to stay in the academic track. I say this despite having had the most scientifically productive year of my life; I am closer to getting a professorship than ever before.

I realize that many will dismiss my story as a tale of sour grapes, or say that my desire is not strong enough or my primary motivation is to get rich. If that is your position, you are simply hoping that future scientists will be unable to love anything other than being a professor. I do love research and teaching with every fiber of my being. I will miss them and it will hurt. But I also love my wife, and if she had treated me the way academia treats its scientists, I would have left her long ago.

还有很多其它的好文：http://toddharris.net/blog/2015/03/23/its-time-to-reboot-bioinformatics-education/

http://www.michaeleisen.org/blog/?p=1270

http://simplystatistics.org/tag/bioinformatics/

他们对生物信息学的讨论有点类似于我国的科学网，但是我国的科学网博主水平很有限

十 12

生物信息学工程师在美帝的工资水平

Posted on 2015年10月12日 by ulwvfje

今天逛论坛的时候，我看了一个宾夕法尼亚大学的生物信息学招聘启事：https://psu.jobs/job/60050

很有趣的是，我看到了他们的工资层级，而他们要招聘的生物信息学工程师的待遇是K，L级别的，也就是最低也是5万美金的年薪，折合成人民币还是蛮可观的，虽然我不是很清楚这个待遇在美帝属于什么样的水平，当然跟美帝的程序员肯定是没得比的，但是比国内的大部分程序员都还有好了。

Salary Band	Minimum	Midpoint	Maximum
A	$16,104	$23,748	$31,392
B	$17,712	$26,124	$34,524
C	$19,152	$28,728	$38,304
D	$21,072	$31,620	$42,156
E	$23,604	$35,400	$47,196
F	$26,436	$39,660	$52,872
G	$29,136	$44,412	$59,712
H	$33,192	$50,616	$68,040
I	$37,848	$57,696	$77,580
J	$42,444	$65,772	$89,136
K	$49,236	$76,308	$103,392
L	$57,120	$88,524	$119,928
M	$66,240	$102,672	$139,116
N	$78,168	$121,152	$164,148
O	$90,768	$142,968	$195,168
P	$107,124	$168,696	$230,280
Q	$126,396	$199,056	$271,728
R	$151,668	$238,872	$326,088

A Bioinformatics Analyst position is available within the Bioinformatics Consulting Center at The Pennsylvania State University.

The position is supported by the Huck Institutes for the Life Sciences and requires the candidate to work with multiple project investigators to design and implement computational pipelines for data produced by high throughput sequencing instruments and others, with particular emphasis on metagenomics and microbiome analyses.

Responsibilities include the following: developing and/or maintaining existing software pipelines for analyzing high throughput sequencing data; identifying, evaluating and documenting new methodologies to support ongoing research needs; writing code and developing solutions to computational biology problems, with particular emphasis on microbiome and related samples. The Bioinformatics Analyst will become part of an interdisciplinary team composed of other bioinformatics staff, students and researchers and is expected to interact with other life scientists at Penn State and our international partner institutions in Africa and Asia to assist them with identifying research goals, analytical support needs, while carrying out computational data analysis as needed. It is anticipated that approximately 50% of your effort will initially be dedicated to providing bioinformatics support and microbiome analysis pipeline development for high-profile collaborative infectious disease surveillance research and training projects in Tanzania as well as other countries in East Africa and South Asia and may involve a limited amount of international travel (once per year). This job will be filled as a level 3, or level 4, depending upon the successful candidate's competencies, education, and experience. Typically requires a Master's degree or higher in a field of study with focus on computational research methods or higher plus four years of related experience, or an equivalent combination of education and experience for a level 3. Additional experience and/or education and competencies are required for higher level jobs. In-depth understanding of the computational analysis required for processing data from genomic technologies and their applications: Microbiome, metagenomics, RNA-Seq, genome assembly, genomic data visualization, or others. Expertise in handling and processing data in common bioinformatics formats; knowledge of available bioinformatics tools and genomic data repositories; proven track record of delivering bioinformatics solutions; demonstrated programming skills in one or more programming languages: Python, Perl, Java, C and/or numerical platforms: R, MATLAB, Mathematica. Experience handling large data sets generated from sequencing instruments. Excellent communication skills. This is a fixed-term appointment funded for one year from date of hire with excellent possibility of re-funding.

七 17

读书笔记-核酸&蛋白序列分析

Posted on 2015年7月17日 by ulwvfje

核酸序列分析：
一．DNA基本序列分析：bioedit、DNAman、DNAstar
a)   组成成分分析
b)   序列转换分析
c)   酶切位点分析（REBSASE，NEBCutter2）
二．DNA序列特征分析
a)   开放阅读框分析
b)   启动子和转录因子结合位点（数据库EPD,TRANSFAC,DBTSS,TRRD）（软件Promoter Scan，TfBlast，TESS）
c)   CpG岛识别（在线工具：EMBOSS,CpG Island searcher，CpG cluster）
三．重复序列分析
a)   常用数据库（RepBase，RepeatMasker，LINE-1，STR）
四．基因识别，
a)   同源序列比对
b)   组成统计学特征预测（UTR,EXON,INTRON,）
c)   GENESCAN，GRAIL，geneMarkS，Glimmer
五．mRNA可变剪切分析
a)   常用数据库（ASTD,ASD,ASAP）
b)   在线工具（ASPicDB,ESEfinder,RESCUE-ESE）
六．miRNA与靶基因预测
a)   预测方法（同源片段搜索，比较基因组学预测，序列结构特征打分，靶标预测，机器学习）
b)   数据库资源（miRNA，miRBase，TarBase，miRGen，MIRSCAN，MiPred，miRFinder，miRanda，Targetscan，PicTar）

蛋白序列分析
一．蛋白基本序列分析
a)   氨基酸组成（ExPASy）
b)   电荷性质，疏水性（ProtScale）
c)   理化性质（ProtParam）
二．序列特征信息
a)   跨膜区预测（TMpred）
b)   信号肽分析（signalP）
c)   卷曲螺旋分析（COILS）
三．功能信息分析
a)   PROSITE蛋白质功能数据库
b)   结构域和功能位点分析InterProScan
c)   基于蛋白同源性的功能分析（blastp等）
四．蛋白质结构分析
a)   二级结构预测-（Chou-Fasman，GOR，PHD，NNSSP以及多元预测方法）
b)   三级结构预测-（同源建模。重头预测）

七 03

毕业生入深户完全指南

Posted on 2015年7月3日 by ulwvfje

第一步：网上个人测评

申请人登录深圳市人力资源保障局官方网站(www.szhrss.gov.cn)，进入“网上办事”--“网上申办”--“深圳市人才引进(毕业生、在职人才引进)测评与申报系统”，注册个人账户，注册成功后通过个人用户登录系统选择 “毕业生接收”，根据系统提示填写个人信息，填报完成后，点击“保存”--点击“按当前填报信息测评”，系统将判断所填报人员是否符合毕业生接收政策并列出符合的政策条款。

也可以直接去测评网址，注册之后填一些信息https://sz12333.gov.cn/rcyj/

Ps:信息填写要真实，填写完了之后等待审核，一般三到五个工作日即可审核完毕，没什么特殊情况都会通过的，如果查看到自己审核通过了就可以进行第二步啦！

第二步：上门签订人事代理协议

符合毕业生接收政策的，即可与市人力资源局认可的人力资源代理机构签订个人申办委托办理协议，委托其办理毕业生接收手续。

上门需要带一些必备的资料，如下所示：

序号

材料名称

接收高等院校应届毕业生呈报表（收原件）

毕业生推荐表、成绩单（收原件）

学历及学位证书（申报时已毕业的验原件,收复印件；申报时未毕业的报到时验原件，不收复印件）

身份证（收复印件）

户口簿（户籍证明）

以上所有能带原件的都带上，然后所有原件都有复印一份！

代理机构有很多，大家选择自己最方便的，我去的是深圳市人才交流服务中心（高新区分部）

这个步骤需要上门，而且还需要排队，很可能需要排队两个到三个小时。还需要交钱，可能是260左右，可以刷卡。

PS：这个步骤因为要请假，所以大家一定要带全资料！！！办理很简单，主要是排队时间太长，办理完了会给你一个回执，你按照回执的提示15个工作日左右即可查看自己是否办理成功！如果成功了就再来一次，拿接收函！！！

第三步：用深圳市的接收函在学校拿报到证和户口迁移证

如果你是刚毕业，报到证还没有，那么这一步很简单，委托学校的同学帮忙即可。

如果你已经被开过报到证了（一般是遣返回老家啦），你就需要改派报到证啦！这个改派其实很简单，你需要自己看看你们学校改派流程，委托同学把新的报到证寄给你即可，如果你的档案还在学校就要求学校档案馆把你的档案通过机要传给深圳（15天左右），如果你的档案被遣返回家或者异地，那么你就要打电话去你的档案所在地要求他们帮你把你的档案通过机要传给深圳（15天左右）！

如果你的户口在学校，那么很简单，去你学校弄一个户口迁移证即可。

如果你的户口在老家，那么就麻烦了，还需要农转非什么的，看看你家里人的关系吧！

Ps：用深圳的接收函回学校成功拿到报到证和户口迁移证之后要随时上网查看自己的档案是否到达深圳。

第四步：拿介绍信和深圳市入户人员信息卡

这一个步骤不需要排队，在罗湖人才市场，需要身份证，毕业证，学位证，学历验证报告，报到证和户口迁移证原件及复印件各一份，缺一不可！！！

Ps：学历验证报告在学信网即可弄，请保证有效期至少一年以上！！！

第五步：去派出所办理户口身份证

这个需要预约！

重要的事情说三遍！如果你预约好了，那么你从罗湖人才市场拿到了介绍信和深圳市入户人员信息卡后就可以直接去派出所啦！！！但是如果你没有预约，你就得再等一个星期等到拿到预约时间后才能去派出所办理！

除了需要你在罗湖人才市场拿到了介绍信和深圳市入户人员信息卡，还需要数码照相回执和身份证，以及它们的复印件！！！

Ps：如果你是落户到高新园区派出所，那么你还有个近路，直接去迈瑞警务室也能完成落户流程！

到这里，落户就完成啦！十个工作日之后去派出所拿新的身份证即可！是不是非常简单呀小朋友们！

当然别忘了最后的彩蛋！网上深圳市新引进人才租房补贴系统

https://sz12333.gov.cn/szhr_pubtalent/talent_login.jsp

点击进入，有惊喜。

科未满30周岁、硕士未满35周岁、博士未满40周岁。租房补贴标准为：本科6000/人，硕士9000元/人；博士12000元/人。

总结一下：你需要请假三次或者四次，分别是去签人才引进代理协议，再去签人才引进代理协议的地方拿接收函，去罗湖人才市场拿介绍信和入户信息卡，去派出所办理落户及新身份证！

这个流程如果你仔细看了，而且保证按照流程走，当然，以你在各个单位拿到的最新资料为准，记住，各种材料宁可多带，也不能缺，一旦你少带了什么，没有人会跟你讲人情的，一切推倒重来！应该还算是蛮简单的，如果有任何疑问，欢迎咨询我QQ1227278128

六 08

探究各个步骤对snp-calling的影响

Posted on 2015年6月8日 by ulwvfje

做snp-calling时很多标准流程都会提到去除PCR重复这个步骤，但是这个步骤对找snp的影响到底有多大呢？这里我们来探究一下

去除PCR重复前	样本名	去除PCR重复后
106082	BC1-1.snp	103829
101443	BC1-2.snp	99500
103937	BC2-1.snp	101833
102979	BC2-2.snp	101022
105876	BC3-1.snp	103562
109168	BC3-2.snp	107052
107155	BC4-1.snp	104894
108335	BC4-2.snp	106031
100236	BC5-1.snp	98417
102322	BC5-2.snp	100395
103466	BC6-1.snp	101405
112940	BC6-2.snp	110611
113166	BC7-1.snp	110948
114038	BC7-2.snp	116090
123670	PC1-1.snp	121697
111402	PC1-2.snp	109389
106917	PC2-1.snp	105149
108724	PC2-2.snp	106776

可以看到去除pcr重复这个脚本对snp-calling的结果影响甚小，就是少了那么一千多个snp，脚本如下，我是用picard-tools进行的去除PCR重复，当然也可以用samtools来进行同样的步骤

[shell]

for i in *.sorted.bam

do

echo $i

java -Xmx120g -jar /home/jmzeng/snp-calling/resources/apps/picard-tools-1.119/MarkDuplicates.jar \

CREATE_INDEX=true REMOVE_DUPLICATES=True \

ASSUME_SORTED=True VALIDATION_STRINGENCY=LENIENT METRICS_FILE=/dev/null \

INPUT=$i OUTPUT=${i%%.*}.sort.dedup.bam

done

[/shell]

然后我们首先看看没有产生变化的那些snp信息的改变

head -50 ../rmdup/out/snp/BC1-1.snp |tail |cut -f 1,2,8

chr1 17222 ADP=428;WT=0;HET=1;HOM=0;NC=0

chr1 17999 ADP=185;WT=0;HET=1;HOM=0;NC=0

chr1 18091 ADP=147;WT=0;HET=1;HOM=0;NC=0

chr1 18200 ADP=278;WT=0;HET=1;HOM=0;NC=0

chr1 24786 ADP=238;WT=0;HET=1;HOM=0;NC=0

chr1 25072 ADP=24;WT=0;HET=1;HOM=0;NC=0

chr1 29256 ADP=44;WT=0;HET=1;HOM=0;NC=0

chr1 29265 ADP=44;WT=0;HET=1;HOM=0;NC=0

chr1 29790 ADP=351;WT=0;HET=1;HOM=0;NC=0

chr1 29939 ADP=109;WT=0;HET=1;HOM=0;NC=0

head -50 BC1-1.snp |tail |cut -f 1,2,8

chr1 17222 ADP=457;WT=0;HET=1;HOM=0;NC=0

chr1 17999 ADP=196;WT=0;HET=1;HOM=0;NC=0

chr1 18091 ADP=155;WT=0;HET=1;HOM=0;NC=0

chr1 18200 ADP=313;WT=0;HET=1;HOM=0;NC=0

chr1 24786 ADP=254;WT=0;HET=1;HOM=0;NC=0

chr1 25072 ADP=25;WT=0;HET=1;HOM=0;NC=0

chr1 29256 ADP=46;WT=0;HET=1;HOM=0;NC=0

chr1 29265 ADP=46;WT=0;HET=1;HOM=0;NC=0

chr1 29790 ADP=440;WT=0;HET=1;HOM=0;NC=0

chr1 29939 ADP=123;WT=0;HET=1;HOM=0;NC=0

可以看到，同一位点的snp仍然可以找到，仅仅是对测序深度产生了影响

然后我们再看看去除PCR重复这个步骤减少了的snp，在原snp里面是怎么样的

perl -alne '{$file++ if eof(ARGV);unless ($file){$hash{"$F[0]_$F[1]"}=1} else {print if not exists $hash{"$F[0]_$F[1]"} } }' ../rmdup/out/snp/BC1-1.snp BC1-1.snp |less

这个脚本就可以把去除PCR重复找到的snp位点在没有去除PCR重复的找到的snp文件里面过滤掉，查看那些去除PCR重复之前独有的snp

Min. 1st Qu. Median Mean 3rd Qu. Max.

8.00 8.00 11.00 44.26 25.00 7966.00

可以看到被过滤的snp大多都是测序深度太低了的，如下面的例子

chr1 726325 a 9 CCC.ccc,^:, IEHGHHG/9

chr1 726325 a 5 C.c,^:, IGH/9

chr1 726338 g 16 TTT.ttt,,....,,, IHGI:9<HIIFIHC5H

chr1 726338 g 10 T.t,,...,, II:HIIFH5H

可以看到这一步还是很有用的，但是怎么说呢，因为最后对snp的过滤本来就包含了一个步骤是对snp的测序深度小于20的给过滤掉

但是也有个别的测序深度非常高的snp居然也是被去除PCR重复这个步骤给搞没了！很奇怪，我还在探索之中.

grep 13777 BC1-1.mpileup |head

chr1 13777 G 263 ........,.C,,,,,.,,,.......,,,..,....,,......,.....c,........,,,,,,,..,...,,,,,.........,......C.......,,,,,,,,,,.....,,,,,,,.,,,..C,,,,,,CC,c,,,...C..,,,,cC.C..CC.CC,,cc,.C...C,,,,CCc,c,,,,,,,c,C.C.CC...C.cc,c...,C.CCcc...,CCC.C.CC..CCC..CC.c,cc,cc,,cc,C.,,^!.^6.^6.^6.^!, HIHIIIIEIEIHGIIIFIHIG?IIIIHIIHIFHIIHICIIIHIIGIEIIGIIIGHIIIIIIHIIHIHIIIIIIIHII1I?GHHHEHHIIEIEHIIEIIHHIIFIIIFHIHIIIIHIHIIHIIHHIIEIIIIIIHIIIIIIIIIG1HIIIIHIHIEHIHIHIIIIIIIIIIIHICIHIIIIIEIIIIHICIHGGIIIIIIIIEHIHIIIIIIHFIGGIIIIGIIIGICIIIHIIIIIIIIIIIHHHIIIIIHIIHDDII>>>>>

grep 13777 BC1-1.rmdup.mpileup |head

chr1 13777 G 240 ........,.C,,,,,.,,,.......,,,..,....,,......,....c,......,,,,,,,..,...,,,,,.........,......C......,,,,,,,,,,.....,,,,,,,.,,,..C,,,,,,CC,c,,,...C..,,,,cC..CC.CC,cc,.C...C,,,,CCc,c,,,,,,,cC.C.C..C.c,c...,C.CCcc...,CC.C.CCC..C.c,cc,,c,.,,^!.^6.^6.^!, HIHIIIIEIEIHGIIIFIHIG?IIIIHIIHIFHIIHICIIIHIIGIEIIIIIHIIIIIHIIHIHIIIIIIIHII1I?GHHHEHHIIEIEHIIEIHHIIFIIIFHIHIIIIHIHIIHIIHHIIEIIIIIIHIIIIIIIIIG1HIIIIHIHIEHIHIIIIIIIIIIHICIHIIIIIEIIIIHICIHGGIIIIIIHIHIIIIIHFIGGIIIIGIIIGCIIIIIIIIIIHHIIIHIHDII>>>>

然后我再搜索了一些

chr8 43092928 . A T . PASS ADP=7966;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:7967:7966:6261:1663:20.9%:0E0:39:39:3647:2614:1224:439

chr8 43092908 . T C . PASS ADP=6968;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:7002:6968:5315:1537:22.06%:0E0:37:38:3022:2293:890:647

chr8 43092898 . T G . PASS ADP=6517;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:6517:6517:4580:1587:24.35%:0E0:38:38:2533:2047:920:667

chr7 100642950 . T C . PASS ADP=770;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:771:770:615:155:20.13%:3.9035E-51:38:38:277:338:65:90

终于发现规律啦！！！原来它们的突变率都略高于20%，在没有去处PCR重复之前，是高于snp的阈值的，但是去除PCR重复对该位点的突变率产生了影响，使之未能通过筛选。

五 05

RNA-seq完整学习手册！

Posted on 2015年5月5日 by ulwvfje

需耗时两个月！里面网盘资料如果过期了，请直接联系我1227278128，或者我的群201161227，所有的资源都可以在 http://pan.baidu.com/s/1jIvwRD8 此处找到

搜索可以得到非常多的流程，我这里简单分享一些，我以前搜索到的文献。

北大也有讲RNA-seq的原理

链接：http://pan.baidu.com/s/1kTmWmv9 密码：6yaz

甚至，我还有个华大的培训课程！！！这可是5天的培训教程哦，好像当初还花了五千多块钱的资料！！！

链接：http://pan.baidu.com/s/1nt5OV5B 密码：gyul

优酷也有视频，可以自己搜索看看

然后还有几个pipeline，就是生信的分析流程，即使你啥都不会，按照pipeline来也不是问题啦

export PATH=/share/software/bin:$PATH

bowtie2-build ./data/GRCh37_chr21.fa chr21

tophat -p 1 -G ./data/genes.gtf -o P460.thout chr21 ./data/P460_R1.fq ./data/P460_R2.fq

tophat -p 1 -G ./data/genes.gtf -o C460.thout chr21 ./data/C460_R1.fq ./data/C460_R2.fq

cufflinks -p 1 -o P460.clout P460.thout/accepted_hits.bam

cufflinks -p 1 -o C460.clout C460.thout/accepted_hits.bam

samtools view -h P460.thout/accepted_hits.bam > P460.thout/accepted_hits.sam

samtools view -h C460.thout/accepted_hits.bam > C460.thout/accepted_hits.sam

echo ./P460.clout/transcripts.gtf > assemblies.txt

echo ./C460.clout/transcripts.gtf >> assemblies.txt

cuffmerge -p 1 -g ./data/genes.gtf -s ./data/GRCh37_chr21.fa assemblies.txt

cuffdiff -p 1 -u merged_asm/merged.gtf -b ./data/GRCh37_chr21.fa -L P460,C460 -o P460-C460.diffout P460.thout/accepted_hits.bam C460.thout/accepted_hits.bam

samtools index P460.thout/accepted_hits.bam

samtools index C460.thout/accepted_hits.bam

和另外一个

#!/bin/bash

# Approx 75-80m to complete as a script

cd ~/RNA-seq

ls -l data

tophat --help

head -n 20 data/2cells_1.fastq

time tophat --solexa-quals \

-g 2 \

--library-type fr-unstranded \

-j annotation/Danio_rerio.Zv9.66.spliceSites\

-o tophat/ZV9_2cells \

genome/ZV9 \

data/2cells_1.fastq data/2cells_2.fastq # 17m30s

time tophat --solexa-quals \

-g 2 \

--library-type fr-unstranded \

-j annotation/Danio_rerio.Zv9.66.spliceSites\

-o tophat/ZV9_6h \

genome/ZV9 \

data/6h_1.fastq data/6h_2.fastq # 17m30s

samtools index tophat/ZV9_2cells/accepted_hits.bam

samtools index tophat/ZV9_6h/accepted_hits.bam

cufflinks --help

time cufflinks -o cufflinks/ZV9_2cells_gff \

-G annotation/Danio_rerio.Zv9.66.gtf \

-b genome/Danio_rerio.Zv9.66.dna.fa \

-u \

--library-type fr-unstranded \

tophat/ZV9_2cells/accepted_hits.bam # 2m

time cufflinks -o cufflinks/ZV9_6h_gff \

-G annotation/Danio_rerio.Zv9.66.gtf \

-b genome/Danio_rerio.Zv9.66.dna.fa \

-u \

--library-type fr-unstranded \

tophat/ZV9_6h/accepted_hits.bam # 2m

# guided assembly

time cufflinks -o cufflinks/ZV9_2cells \

-g annotation/Danio_rerio.Zv9.66.gtf \

-b genome/Danio_rerio.Zv9.66.dna.fa \

-u \

--library-type fr-unstranded \

tophat/ZV9_2cells/accepted_hits.bam # 16m

time cufflinks -o cufflinks/ZV9_6h \

-g annotation/Danio_rerio.Zv9.66.gtf \

-b genome/Danio_rerio.Zv9.66.dna.fa \

-u \

--library-type fr-unstranded \

tophat/ZV9_6h/accepted_hits.bam # 13m

time cuffdiff -o cuffdiff/ \

-L ZV9_2cells,ZV9_6h \

-T \

-b genome/Danio_rerio.Zv9.66.dna.fa \

-u \

--library-type fr-unstranded \

annotation/Danio_rerio.Zv9.66.gtf \

tophat/ZV9_2cells/accepted_hits.bam \

tophat/ZV9_6h/accepted_hits.bam # 7m

head -n 20 cuffdiff/gene_exp.diff

sort -t$'\t' -g -k 13 cuffdiff/gene_exp.diff \

> cuffdiff/gene_exp_qval.sorted.diff

head -n 20 cuffdiff/gene_exp_qval.sorted.diff

四 30

阿里巴巴免费的服务器体验好差！

Posted on 2015年4月30日 by ulwvfje

不知道为什么最近进入自己的网页后台总是很慢，发个日志也慢，很是郁闷！

本来以为是免费的空间快用完了，所以慢，结果一查，根本就没有用多，其实我很想投诉一下阿里巴巴！

想想该搞个国外服务器了，然后把网站搬家！

四 10

一步一步运行软件系列合集

Posted on 2015年4月10日 by ulwvfje

这些是很久以前写的一些教程，是关于进化树构建和全基因组关联分析的！

gwas-plink分析教程.pdf
plink的统计基础.ppt
一步一步构建系统进化树.pdf
一步一步运行blast.pdf
一步一步运行inparanoid蛋白聚类.pdf
一步一步运行PLINK-part1.pdf
一步一步运行plink-part2.pdf
用PhyML构建系统发育树.pptx
进化树的构建分子原理.pdf

都在云盘(http://pan.baidu.com/s/1jIvwRD8 )里面，群空间（201161227）里面也有！

暂时应该不会写这些教程了，因为没有项目，实在没有动力去做那么多事情

三 19

个人网站的计划

Posted on 2015年3月19日 by ulwvfje

转录组方向：

数据来源是NCBI里面的一个文献

其中转录组方向的那些软件流程大多已经跑完了，大家可以见我的转录组总结。

trinity，tophat，cufflinks，RseQC，RNAseq，GOseq，MISO，RSEM，khmer，screed，trimmomatic，transDecoder，vast-tools，picard-tools，htseq，cuffdiff，edgeR，DEseq，funnet，davidgo，wego，kobas，KEGG，Amigo，go

基因组方向：

数据来源是strawberry草莓的文献

velvet，SOAPdenovo2，repeatmasker,repeatscount,piler，

Chip-seq方向：

这个群里有高手说要跟我合作，他来帮我写，希望是真的！

免疫组库方向：

这个其实没有成熟软件，也就是一个igblastn, 然后是IMGT数据库，但是是我主打的产品，所以我会详细介绍一下。

全外显子组方向：

这方面我不是很懂，。好像主要就是snp-calling

Snp-calling方向：

这个我准备自己写软件，不仅仅是用别人的软，它的数据本身也是前面几个方向的数据

bwa，bowtie，samtools，GATK，VarScan.jar，annovar

进化方向：

数据就是基因组数据

orthMCL，inparanoid, clustw,muscle，MAFFT，quickparanoid，blast2go，RAxML，phyML

三 19

转录组总结

Posted on 2015年3月19日 by ulwvfje

网站成立也快一个月了，总算是完全搞定了生信领域的一个方向，当然，只是在菜鸟层面上的搞定，还有很多深层次的应用及挖掘，仅仅是我所讲解的这些软件也有多如羊毛的参数可以变幻，复杂的很。其实我最擅长的并不是转录组，但是因为一些特殊的原因，我恰好做了三个转录组项目，所以手头上关于它的资料比较多，就分享给大家啦！稍后我会列一个网站更新计划，就好谈到我所擅长的基因组及免疫组库。我这里简单对转录组做一个总结：

首先当然是我的转录组分类网站啦

http://www.bio-info-trainee.com/?cat=18

同样的我用脚本总结一下给大家

http://www.bio-info-trainee.com/?p=370阅读更多关于《转录组-GO和KEGG富集的R包clusterProfiler》

http://www.bio-info-trainee.com/?p=359阅读更多关于《转录组-GO通路富集-WEGO网站使用》

http://www.bio-info-trainee.com/?p=346阅读更多关于《转录组-TransDecoder-对trinity结果进行注释》

http://www.bio-info-trainee.com/?p=271阅读更多关于《转录组cummeRbund操作笔记》

http://www.bio-info-trainee.com/?p=255阅读更多关于《转录组edgeR分析差异基因》

http://www.bio-info-trainee.com/?p=244阅读更多关于《转录组HTseq对基因表达量进行计数》

http://www.bio-info-trainee.com/?p=166阅读更多关于《转录组cufflinks套装的使用》

http://www.bio-info-trainee.com/?p=156阅读更多关于《转录组比对软件tophat的使用》

http://www.bio-info-trainee.com/?p=125阅读更多关于《Trinity进行转录组组装的使用说明》

http://www.bio-info-trainee.com/?p=113阅读更多关于《RSeQC对 RNA-seq数据质控》

同时我也讲了如何下载数据

http://www.bio-info-trainee.com/?p=32

原始SRA数据首先用SRAtoolkit数据解压，然后进行过滤，评估质量，然后trinity组装，然后对组装好的进行注释，然后走另一条路进行差异基因，差异基因有tophat+cufflinks+cummeRbund，也有HTseq 和edgeR等等，然后是GO和KEGG通路注释，等等。

在我的群里面共享了所有的代码及帖子内容，欢迎加群201161227，生信菜鸟团！

http://www.bio-info-trainee.com/?p=1

线下交流-生物信息学
同时欢迎下载使用我的手机安卓APP

http://www.cutt.com/app/down/840375

三 17

我的APP终于上线啦！！！

Posted on 2015年3月17日 by ulwvfje

我真的不是程序员，也没时间去自己写一个APP，无意中看到了一个APP的弹出页面写着简易APP工厂支持，我试着搜索了一下，才知道，原来他们提供了一个平台，傻瓜式的创建一个自己的APP，当然，现在好像只是免费提供安卓版本，不过也非常实用！！！

非常easy的节目，大家如果有兴趣，也可以自己下载一个！！！

然后我顺便搜索了一下我的网站效果，发现现在终于被百度搜录了，而且，居然，我很久以前写的菜鸟生物信息学居然还能排名第二，我很久以前的想法就是分享一下自己学习过程中的艰辛曲折，给后学者们借鉴，希望这样可以帮到更多的朋友！

生信菜鸟团 | 欢迎加群201161227,线下交流-深圳大学城

希望有在深圳的生信从业人员或者学生能看到此广播,我们可以组成兴趣小组交流一下各自所学,或者合作翻译一些技术文档或者制作生信常用软件的使用说明书。

三 12

生信菜鸟养成手册

Posted on 2015年3月12日 by ulwvfje

生信菜鸟养成手册

背景：生物小本，懂做一些分子实验，了解一些生物背景知识。

目标：成为生信菜鸟，找到一份生物信息学相关的工作。

1、计算机基础（linux+perl+R 或者 python+matlab）

2、生信基础知识（测序+数据库+数据格式）

3、生信研究领域（全基因组，全转录组，全外显子组，捕获目标区域测序）

4、生信应用领域（肿瘤筛查，产前诊断，流行病学，个性化医疗）

Continue reading →

三 11

菜鸟建站教程三部曲

Posted on 2015年3月11日 by ulwvfje

菜鸟建站教程三部曲

门牌号(域名)——房间（主机）——装修（网站源码）

在没有拥有自己网站之前，我曾无数次害怕过这个过程，以为会有各种各样的麻烦，需要学html、sql、php、javascript、Dreamweaver、需要花大量的时间写程序，需要花大笔的金钱去买域名买空间，而且买到域名和空间也不知道还会有哪些步骤，一切的一切都看起来是那么的困难。而今回首，才发现，整个过程居然只有一个小时即可！！！

Continue reading →

三 09

wordpress安装代码高亮插件

Posted on 2015年3月9日 by ulwvfje

wordpress安装代码高亮插件

能做到代码高亮的插件实在是太多了，我这里随便选择一个。首先在wordpress的插件里面查找SyntaxHighlighter Evolved这个插件。

Continue reading →

三 07

广播–深圳生物信息兴趣小组

Posted on 2015年3月7日 by ulwvfje

希望有在深圳的生信从业人员或者学生能看到此广播，我们可以组成兴趣小组交流一下各自所学，或者合作翻译一些技术文档或者制作生信常用软件的使用说明书。
简单介绍一下本人，精通perl和R，勉强可以使用python和matlab，熟练生信的linux环境配置及各大软件的配置。熟练使用基因组及转录组的大部分软件。
现在计划对一些生信入门资料做简单整理，包括以下五个部分内容，及自己的一些随笔。

常用数据库（NCBI，ensembl，UCSC，uniprot,IMGT,KEGG，OMIN，TIGR，GO）
常见数据格式(sam,vcf,gtf,psl,blast-m-8，fa，fq，genbank，bed等)
大型国际计划（1000Genome，hapmap，ENCODE等）
生信基础软件(blast++套件，fastqc，flash，blast，solexaQA，NGS-QC-toolkit，SRA-toolkit，fastx-toolkit)
snp-calling相关软件（bwa，bowtie，samtools，GATK，VarScan.jar，annovar）
基因组相关软件（velvet，SOAPdenovo2，repeatmasker,repeatscount,piler，orthMCL，inparanoid,clustw,muscle，MAFFT，quickparanoid，blast2go，RAxML，phyML）
转录组相关软件（trinity，tophat，cufflinks，RseQC，RNAseq，GOseq，MISO，RSEM，khmer，screed，trimmomatic，transDecoder，vast-tools，picard-tools，htseq，cuffdiff，edgeR，DEseq，funnet，davidgo，wego，kobas，KEGG，Amigo，go）
外显子组、表观遗传学组、宏基因组相关软件(待定)
计算机基础（linux，perl，R）

慢慢的我都会把这些制作一个简易介绍文档，如果兴趣小组规模足够大，我们也可以制作精美ppt。希望找到深圳生信朋友我们一起交流，一起合作，一起进步。
因为反正做生信的不用加班，平时跑个代码也不忙，有很多时间可以研究技术，而且这种技术跟着大家一起学是最快的，而且现场交流非常方便，平时周六日什么的大家可以聚会一起玩，最好不要是华大的，不是歧视他们，主要是太偏僻了。
有意者联系我QQ1227278128，或者直接打给我电话也行，15314025716。

Page 3 of 31 23

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

Category Archives: 杂谈-随笔

我的生物信息学视频上线啦

hapmap计划资源收集

生物信息小白如何自学编程

一个MIT的博士要离开学术圈，结果引发了上千人的热烈讨论（上）

Goodbye Academia

生物信息学工程师在美帝的工资水平

读书笔记-核酸&蛋白序列分析

毕业生入深户完全指南

第一步：网上个人测评

第二步：上门签订人事代理协议

第三步：用深圳市的接收函在学校拿报到证和户口迁移证

第四步：拿介绍信和深圳市入户人员信息卡

第五步：去派出所办理户口身份证

探究各个步骤对snp-calling的影响

做snp-calling时很多标准流程都会提到去除PCR重复这个步骤，但是这个步骤对找snp的影响到底有多大呢？这里我们来探究一下

RNA-seq完整学习手册！

需耗时两个月！里面网盘资料如果过期了，请直接联系我1227278128，或者我的群201161227，所有的资源都可以在 http://pan.baidu.com/s/1jIvwRD8 此处找到

阿里巴巴免费的服务器体验好差！

一步一步运行软件系列合集

个人网站的计划

转录组总结

我的APP终于上线啦！！！

生信菜鸟团 | 欢迎加群201161227,线下交流-深圳大学城

生信菜鸟养成手册

菜鸟建站教程三部曲

wordpress安装代码高亮插件

广播–深圳生物信息兴趣小组

2026年2月
一	二	三	四	五	六	日
« 九
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28