春节期间刷文献,看到文章《OCTAD: an open workspace for virtually screening therapeutics targeting precise cancer patient groups using gene expression features》,链接是:https://www.nature.com/articles/s41596-020-00430-z 提到了GEO和ArrayExpress数据库资源,以前没有概念,看完之后,大吃一惊啊:
- The Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) from the National Center for Biotechnology Information is a public functional genomics data repository consisting of over 3 million samples from over 110,000 studies as of September 2019.
- ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) is another functional genomics dataset that has over 55 TB of data from over 70,000 experiments as of September 2019.
有些粉丝问到我们《生信技能树》提供的GEO中国区镜像其实名不符其实,因为仅仅是包含了几万个表达量芯片的数据集,并不是全部的GEO数据库的备份。哪怕是我最近升级了,见:你的GEO中国区镜像该升级啦,也不敢备份其它类型数据。
实在是囊中羞涩,仅仅是这些表达量芯片矩阵就耗费了我三万块钱了,如果是全部的十几万个数据集,各种NGS组学数据,成本起码得除以1000,我目前还真拿不出几千万的闲钱来做公益。而且还有ArrayExpress 的55 TB 数据,我看都不敢看!
如果是差异分析,基本上看我五年前的《数据挖掘》系列推文 就足够了;