表观组学流程,主要看到就是分析拿到peaks区域,包括ChIP-seq,ATAC-seq,甚至单细胞的ATAC-seq。
但是拿到了peaks后,还有一个非常重要的步骤, 就是需要过滤掉ENCODE的黑名单区域。
比如,我看到文章:《Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin》,链接是:https://doi.org/10.1016/j.cell.2020.09.056 就提到了:peaks overlapping with ENCODE blacklisted regions (https://sites.google.com/site/anshulkundaje/projects/blacklists) were filtered out.
是如下所示的bed文件,里面涵盖的坐标区域的peaks都是可以直接被过滤掉。
NEW: VERSION 3 (05/20/2020)
- HUMAN (hg38/GRCh38): https://www.encodeproject.org/files/ENCFF356LFX/ (Manually curated)
README: At the bottom of the above page - HUMAN (hg19/GRCh37): ENCODE portal link: https://www.encodeproject.org/files/ENCFF001TDO/ (Manually curated, same as version 1)
UCSC Genome browser track http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability
README on how this track of generated: http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg19-human/hg19-blacklist-README.pdf
For other species you can use the Version 2 blacklists at https://github.com/Boyle-Lab/Blacklist/tree/master/lists