什么，需要5Tb内存啊！

众所周知，单细胞数据分析对计算机资源的要求有一点高，尤其是10X单细胞转录组数据的多样本合并那个步骤！

最近我就接到一个粉丝咨询，说他想处理一个公共数据集，只有8个原位肿瘤+3个转移肿瘤的10X单细胞转录组样品，但是数据处理的过程发现系统提示说需要5Tb内存，虽然说他自己有一个512G内存的服务器，但是也承受不起5Tb内存，问我有没有渠道！

额，给他配置一个5Tb内存服务器倒是简单，我自己就有2.5T内存的服务器，不就是加倍嘛！不过，我注意到他就是11个10X转录组样品，理论上不可能是需要5Tb内存的，所以让他把代码发过来我检查看看.

首先是读取多个10x单细胞转录组文件夹，需要保证每个文件夹下面都是有3个文件哦：

# For output from CellRanger < 3.0
# Should show barcodes.tsv, genes.tsv, and matrix.mtx
# For output from CellRanger >= 3.0 with multiple data types
 # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz
sceList = lapply(samples,function(pro){ 
 #pro=samples[1]
 folder=file.path('outputs',pro ) 
 print(pro)
 print(folder)
 print(list.files(folder))
 print( Sys.time() )
 sce=CreateSeuratObject(counts = Read10X(folder), 
 project = pro )
 print( Sys.time() )
 return(sce)
})

然后是直接走CCA整合：

sceList
for (i in 1:length(sceList)) {
 print( Sys.time() )
 sceList[[i]] <- NormalizeData(sceList[[i]], verbose = FALSE)
 sceList[[i]] <- FindVariableFeatures(sceList[[i]], selection.method = "vst", 
 nfeatures = 2000, verbose = FALSE)
}
sceList
Sys.time()
sce.anchors <- FindIntegrationAnchors(object.list = sceList, dims = 1:30)
# load('sce.anchors.RDATA')
Sys.time()
sce.integrated <- IntegrateData(anchorset = sce.anchors, dims = 1:30)
Sys.time()

看起来中规中矩，就是我一直教学使用的代码。没办法，我只好让他把数据发过来了。

我自己也读取看看，让我留意到了它居然每个样品有70万个细胞！！！所以我猜测应该是他的10X的3个文件里面并没有过滤，把全部的barcode输出了，我就给他加上了一个简单的检查代码，以及两个标准过滤：

lapply(sceList, function(x) print(x))
# retaining cells that had unique
# molecular identifiers (UMIs) greater than 400, 
# expressed 100 and 8000 genes
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
sceList = lapply(sceList, function(x) {
 subset(x, subset = nFeature_RNA > 200 & nFeature_RNA < 8000 & nCount_RNA > 400)
})

lapply(sceList, function(x) print(x))

# had mitochondrial content less than 10 percent.

sceList = sceList = lapply(sceList, function(x) {
 rownames(x)[grepl('^mt-',rownames(x),ignore.case = T)]
 x[["percent.mt"]] <- PercentageFeatureSet(x, pattern = "^MT-")
 subset(x, subset = percent.mt < 10 )
})
lapply(sceList, function(x) print(x))

接下来使用我的32G内存的Mac，都可以走这个CCA流程啦！

太有意思了，为什么我想讲解这个故事呢，因为在很多交流群都看到有粉丝问内存不够，实际上很多情况下，内存不够是因为你代码学的很差。

如果你连512G内存都没有呢？

临时使用的话，可以考虑我们的共享云哦！

一	二	三	四	五	六	日
« 九
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

生信菜鸟团

欢迎去论坛biotrainee.com留言参与讨论，或者关注同名微信公众号biotrainee

什么，需要5Tb内存啊！

如果你连512G内存都没有呢？