- 新一代测序数据分析-在优酷里面可以搜索到,一下是配套视频的讲义及下载地址!
- Lecture Notes
- Lectures will appear below as they are presented. Homeworks are specified in each handout.
- Lecture 1 - slides, handouts. course information, homework and project information, introduction to computing, setting up you computer, basic unix command line usage, organizing your projects, homework 1.
- Suggested reading: A Quick Guide to Organizing Computational Biology Projects, Plos Comp Bio, 2009,
- Other blog posts on organizing projects and research workflow.
- Lecture 2 - slides, handouts, The GFF format, sequence ontologies, basic Unix commands: wc, grep, cut, sort, redirecting input and output streams, piping commands, processing a tabular file with UNIX tools, homework 2
- dataset for this lecture, the latest yeast genome feature file: saccharomyces_cerevisiae.gff
- Unix flags explained: Explain Shell
- Suggested reading: Unix for Biologists
- Lecture 3 - slides, handouts. programming languages, download and install an proper editor, introduction to the AWK programming language, tabular file processing, filtering by feature types, Awk onliners explained, another collections of AWK oneliners, homework 3.
- Recommended editors Komodo Edit or Sublime Text
- Suggested reading The Art of Unix Programming,
- Biostar question of the day, iterating through files: Very Bad Things
- Lecture 4 - slides, handouts, sequencing technologies, sequence representations, the FASTA format, processing FASTA files at the command line, homework 4.
- dataset for this lecture, sequencing reads from a 454 instrument: lec4.fa (6Mb)
- The FASTA format and the IUPAC codes
- Suggested reading: Molecular Biology for Computer Scientists,
- Biostar question of the day: Sequencing Technology Reviews
- Lecture 5 - slides, handouts, string matching, edit distances, regular expressions, local and global alignments, homework 5.
- Regular Expression tester
- Pairwise sequence alignment tools at EMBL
- Book: comprehensive discussion of basic bioinformatics concepts: Understanding Bioinformatics by Dr. Marketa Zvelebil and Jeremy Baum.
- Lecture 6 - slides, handouts, introduction to using blast, legacy blast and blast+, preparing blast databases, performing a blastn query, formatting blast output, homework 6.
- Download blast executables
- Download: saccharomyces_cerevisiae.gff
- Book: detailed information on "legacy" blast (published in 2003): BLAST by Ian Korf, Mark Yandell and Joseph Bedell
- Lecture 7 - slides, handouts, using blast, formatting databases, using the blastdbcmd, extract sequences, batch operations, formatting blast queries, homework 7.
- Official Blast documentation: BLAST Command Line Applications User Manual
- Lecture 8 - slides, handouts, blast score and E-values, search strategies, usage examples for blastn, blastp, blastx, tblastn, and tblastx, homework 8.
- Download the required dataset
- Optional: Join the Applied Bioinformatics Study Room find the course called Applied Bioinformatics at Penn State
- NCBI guide to BLAST E-Values: The Statistics of Sequence Similarity Scores
- Lecture 9 - slides, handouts, quality encodings, phred scales, the FASTQ format, homework 9.
- The FASTQ format wiki wiki page
- The Phred quality score
- Lecture 10 - slides, handouts, file compression, gzip, zip, bz2, file archives, tarbombs, plotting fastq qualities homework 10.
- Download the lecture-10.tar.gz dataset
- A tarbomb, handle with care ...
- The FastQC toolkit
- Lecture 11 - slides, handouts installing tools, quality control, adapter trimming, error corrections
- SeqTK gitHub page
- Biostar Question of the Day: Fastq Quality Control Shootout
- Package manager Homebrew for Mac
- Suggested reading: ${link('illumina')}.
- Simple sequence related utilities in the Sequence Manipulation Suite
- Software tools for adapter trimming:
- CutAdapt application note in Embnet Journal, 2011
- fastq-mcf published in The Open Bioinformatics Journal, 2013
- PrinSeq application note in Bioinformatics, 2011
- Trimmomatic application note in Nucleic Acid Research, 2012, web server issue
- Trim Galore
- NGS Toolkit published in Plos One, 2012
- Fastx Toolkit
- BioPieces a suite of programs for sequence preprocessing
- Scythe a bayesian adaptor trimmer
- FlexBar, Flexible barcode and adapter removal published in Biology, 2012
- SeqPrep
- TagDust published in Bioinformatics, 2009
- TagCleaner published in BMC Bioinformatics, 2010
- Libraries via R (Bioconductor): PIQA, ShortRead
- Lecture 12 - slides, handouts paired end sequencing, quality control for paired end sequencing, the bioawk language
- download the dataset for lecture 12: lec12.tar.gz (20Mb)
- bioawk GitHub page
- Paried end aware software tools for adapter trimming:
- Trimmomatic application note in Nucleic Acid Research, 2012, web server issue
- fastq-mcf published in The Open Bioinformatics Journal, 2013
- Lecture 13 - slides, handouts paired end sequencing, read stiching, automating tasks with shell scripts
- for the homework use the dataset for lecture 12: lec12.tar.gz (20Mb)
- Trimmomatic application note in Nucleic Acid Research, 2012, web server issue
- Fast Length Adjustment of SHort reads
- Musket - a multistage k-mer spectrum based corrector
- Bash programming HOWTO
- Lecture 14 - slides, handouts short read alignments, bwa, bowtie and other tools.
- official bwa website
- the bwa-mem paper currently on arxiv
- bwa-mem rejection from Bioinformatics and bwa-mem open review
- Heng Li on wikipedia and on sourceforge
- The bowtie1 aligner and the bowtie2 aligner
- Lecture 15 - slides, handouts the sequence alignment map SAM format
- dataset for the homework: lec15.fq.gz (3.8Mb)
- official SAM specification
- What do flags mean: Explain SAM flags
- Lecture 16 - slides, handouts the SAM/BAM format, sorting and indexing BAM files, using the samtools program
- The Samtools page
- What do SAM flags mean: Explain SAM flags
- Lecture 17 - slides, handouts aligning paired end reads, comparing and evaluating aligners, simulating sequencing reads with the wgsim tool
- The wgsim github page
- The dwgsim sourceforge page
- The simNGS tool
- ART: a next-generation sequencing read simulator Bioinformatics, 2012
- Benchmarking short sequence mapping tools, BMC Bioinformatics. 2013
- pIRS: Profile-based Illumina pair-end reads simulator, Bioinformatics. 2012
- GemSIM: general, error-model based simulator of next-generation sequencing data, BMC Genomics. 2012
- Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly
- Lecture 18 - slides, handouts read duplication, visualizing alignments with IGV and IGB
- FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads, PLoS One. 2012;
- The old Pileup format
- Samtools text alignment viewer
- Integrative Genomics Viewer by the Broad Institute, Integrated Genome Browser by UNC Charlotte, Tablet - Next Generation Sequence Assembly Visualization
- Visualizing genomes, techniques and challenges, Nat Methods. 2010
- Neat visualizations via DNA Skittle
- Biostar question of the day What tools/libraries do you use to visualize genomic feature data?
- Lecture 19, guest lecture by Nicholas Stoler - slides, the variant call format (VCF), calling variants with samtools mpileup
- The multi sample pileup format
- 5 Things to Know About SAMtools Mpileup
- The VCF format 4.1 specification
- The The Variant Call Format and VCFtools Poster
- Suggested reading: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data., Bioinformatics, 2011
- Biostar question of the day How to distinguish heterozygotes and homozygotes from variants in VCF format
- Lecture 20,- slides, handouts origins of genome variations, more on SNP calling, successes and failures
- Genome Analysis Toolkit
- Recommended reading: Genomes Unzipped blog:
- Lecture 21,- slides, handouts interval representation, BED and GFF formats, representing data
- Lecture 22,- slides, handouts interval operations: complement, extension, flanking, Using the BedTools package
- Bedtools homepage and the latest documentation
- Bedtools old pdf documentation contains many details that the new documentation does not
- Biostar question of the day What are the most common stupid mistakes in bioinformatics?
- Lecture 23,- slides, handouts interval operations: intersect, window, selecting closest features
- Bedtools homepage and the latest documentation
- The bedops tool and a nice usage example
- Lecture 24,- slides, handouts an introduction to genome assembly, using the velvet assembler, evaluating genome assemblies with QUAST
- The Velvet assembler, see also: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
- The Minia assembler, see also Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012
- The Fermi assembler, see also Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly, Bioinformatics, 2012
- Quast assembly evaluator, see also QUAST: quality assessment tool for genome assemblies
- Lecture 25,- slides, handouts, meta.tar.gz (25MB) an introduction to metagenomics, software packages mothur, QIIME and MetaSim, online tools RDP, MG-RAST
- Lecture 26,- slides, handouts, lec26.tar.gz (25MB) an introduction to Chip-Seq technology, peak calling concepts, preprocessing and peak calling methods (part 1)
- Search google for "Introduction to Chip-Seq", quite a few resources
- Recommended reading Applications of next-generation sequencing (Nature, resources)
- the bioawk-tools utilities
- Lecture 27,- slides, handouts, Chip-Seq peak calling sofware, preprocessing and peak calling methods (part 2)
- The peak caller used in the Encode Project Model-based Analysis for ChIP-Seq and Identifying ChIP-seq enrichment using MACS in Nature Protocols, 2012
- Site Identification from Short Sequence Reads
- A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs, Nature Protocols, 2012
- Lecture 28,- slides, handouts, lec28.tar.gz basic RNA-Seq data analysis concepts, split read mapping
- The Tuxedo suite: Bowtie, Tophat, Cufflinks
- The rlsim simulation package and related paper Realistic simulations reveal extensive sample-specificity of RNA-seq biases
- The FLUX read simulator and related paper: Modelling and simulating generic RNA-Seq experiments with the flux simulator NAR, 2012
- Lecture 29, slides, handouts, lec29.tar.gz RNA-Seq (part 2)
- BLOG: RPKM measure is inconsistent among samples, Damian Kao provides a simple and logical description of a paper that talks about the problems with RPKM
- CSHL Keynote 2013; Dr. Lior Pachter, UC Berkeley TLDR: use TPM (Transcript per million) instead of RPKM, FPKM
- Lecture 30, slides, handouts, bioinformatics beyond the command line: using R for data analysis
- Final Project 30, final-project, data for final project pony.tar.gz (17Mb) BMB 597D: Final project, 50% of the final grade, due 5pm Saturday Dec 14th, 2013