欧洲裔和非裔美国乳腺癌患者差异可以TCGA数据库验证

数据挖掘的本质是把基因的数量搞小,而数据挖掘课题的开启核心就是分组,你可以根据容易基因的高低表达量或者甲基化与否,突变与否来把病人分组, 也可以根据各自生物学功能概念把病人分组。只要分组合理,就可以走差异分析,然后生存分析等等,把基因数量弄小,课题就结束了。

差异分析相信大家都不陌生了,基本上看我六年前的表达芯片的公共数据库挖掘系列推文即可;

所谓的多组学联合,对普通人来说,不外乎如此。

更糟糕的是,绝大部分人也不会有多组学意识,仍然是根据传统分组, 比如病人年龄分组, 族群分组,比如下面的文章:

Differences in gene-expression profiles in breast cancer between African and European-ancestry women 
Jie Ping, Xingyi Guo, Fei Ye, Jirong Long, Loren Lipworth, Qiuyin Cai, William Blot, Xiao-Ou Shu, Wei Zheng
Carcinogenesis, Volume 41, Issue 7, July 2020, Pages 887–893, https://doi.org/10.1093/carcin/bgaa035
Published: 08 April 2020

数据来源于一个 Southern Community Cohort Study (SCCS) 团体,主要是对比 African American (AA) 和 European American (EA) 的 乳腺癌患者差异差异,是转录组测序, 260 AA and 155 EA 对比找差异,然后去TCGA数据库的 180 AA and 838 EA 验证。

  • 19 065 genes (16 586 protein-coding and 2479 lincRNAs)
  • 2001 (10.5%) were differentially expressed in EA and AA at a nominal P value < 0.05,
  • among which 59 genes (54 protein-coding genes and 5 lincRNAs) reached an FDR-adjusted P value < 0.01
    • 31 genes expressed significantly higher in AA than EA women,
    • while the remaining 28 genes expressed significantly higher in EA than AA women

这59个基因在两个转录组队列的表达量如下所示:

image-20210612231211279

还做了生存分析

  • 10 of the 59 genes were associated with overall survival in AA but not in EA,
  • while 7 genes were associated with overall survival in EA but not in AA.

我在生信技能树多次分享过生存分析的细节;

生存分析是目前肿瘤等疾病研究领域的点睛之笔!我们在《生信技能树》b站有两个生存分析免费视频课程, 不知道你能否找到?

其实有了基因集,常规分析都可以走一波!

Comments are closed.