illumina的bead 系列表达芯片扫盲

表达芯片大家最熟悉的当然是affymetrix系列芯片啦,而且分析套路很简单,直接用R的affy包,就可以把cel文件经过RMA或者MAS5方法得到表达矩阵。illumina出厂的芯片略微有点不一样,它的原始数据有3个层级,一般拿到的是Processed data (示例), 当仍然需要一系列的统计学方法才能提取到表达矩阵。接下来我们首先讲一讲illumina的bead 系列表达芯片基础知识吧:

illumina是大厂家,所以芯片包括人类的,小鼠以及大鼠的,然后对于人来说,经历了V1~V4的进化过程,最新版是 V4。
GEO里面是这样介绍illumina bead V4这个芯片的:
The HumanHT-12 v4 Expression BeadChip provides high throughput processing of 12 samples per BeadChip without the need for expensive, specialized automation. The BeadChip is designed to support flexible usage across a wide-spectrum of experiments.
可以在官网下载芯片探针的详情,manifest数据文件:http://support.illumina.com/array/array_kits/humanht-12_v4_expression_beadchip_kit/downloads.html 这些文件写清楚了芯片用的探针的详情,包括使用了哪些control探针,主要是给它自己的BeadStudio 软件来使用的。
NCBI的GEO也提供大批量的公共数据:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL10558
芯片厂家illumina本身提供数据处理软件BeadStudio, GenomeStudio,在R/bioconductor上面也有开源的包做同样的事情,Illuminaio ,beadarray,lumi
数据的前期处理有3个层次,都在bioconductor有对应的包可以来处理
Data can be in raw form, where pixel-level data are available from TIFF images, allowing the complete data processing pipeline, including image analysis, to be carried out in R. 这种图片格式的数据,基本上没有人愿意去开始处理的,TIFF格式的图片压缩包,在BeadArrayUseCases包里面附带有一个测试数据
The next level, referred to as bead-level, refers to the availability of intensity and location information for individual beads. In this format, a given probe will have a variable number of replicate intensities per sample. Processed data, where replicate intensities have been summarized and outliers removed to give a mean, a measure of variability, and a number of observations per probe in each sample, is the most commonly available format.
数据处理流程如下:
illumina-microarray-data-3-step-processing
其实对芯片数据处理最重要的过程,就是如何做QC以及拿到表达量矩阵,后面的差异分析,功能富集分析其实是大同小异的。我比较喜欢用bioconductor包,会讲如何用 lumi包来处理这个芯片数据。
用bioconductor系列包来处理是最方便的,看这个教程就够了:https://bioconductor.org/packages/release/data/experiment/vignettes/BeadArrayUseCases/inst/doc/BeadArrayUseCases.pdf
数据处理流程还在plos one杂志上面发表过文章:http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002276
BMC也有一篇:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4486126/ 他们团队做了一个网页版工具,直接可以上传illumina芯片的原始数据去做 全套分析:http://www.arrayanalysis.org/
在R/bioconductor里面,跟人类相关的illumina beadseed芯片注释包如下:
illuminaHumanv1.db Mark Dunning Illumina HumanWG6v1 annotation data (chip illuminaHumanv1)
illuminaHumanv2.db Mark Dunning Illumina HumanWG6v2 annotation data (chip illuminaHumanv2)
illuminaHumanv2BeadID.db Mark Dunning Illumina HumanWGv2 annotation data (chip illuminaHumanv2BeadID)
illuminaHumanv3.db Mark Dunning Illumina HumanHT12v3 annotation data (chip illuminaHumanv3)
illuminaHumanv4.db Mark Dunning Illumina HumanHT12v4 annotation data (chip illuminaHumanv4)
芯片包装如下:
illumina-microarray

 

Comments are closed.