我在NCBI里面下载了一个dbsnp_142数据库文件,发现它居然有2.5G的大小,我感到很不可思议,毕竟人的基因组也就3G,就30亿的碱基嘛。研究过的突然竟然有110,917,213 ,高达一亿个!!!
谁能给我解释一下呢!
而且人只有十万多个蛋白,2.2万多个基因!
jmzeng@ubuntu:/home/jmzeng/hoston/diff/snp$ wc -l dbsnp_142_chrom_id_rs
110917213 dbsnp_142_chrom_id_rs
jmzeng@ubuntu:/home/jmzeng/hoston/diff/snp$ tail dbsnp_142_chrom_id_rs
MT 16429 rs150751410
MT 16443 rs371960162
MT 16456 rs142662828
MT 16482 rs386419986
MT 16497 rs376846509
MT 16512 rs373943637
MT 16519 rs3937033
MT 16526 rs386829315
MT 16527 rs386829316
MT 16529 rs370705831
jmzeng@ubuntu:/home/jmzeng/hoston/diff/snp$ head dbsnp_142_chrom_id_rs
1 10108 rs62651026
1 10109 rs376007522
1 10139 rs368469931
1 10144 rs144773400
1 10150 rs371194064
1 10177 rs201752861
1 10177 rs367896724
1 10180 rs201694901
1 10228 rs143255646
1 10228 rs200462216