往期GATK4教程目录:
官网教程
非常复杂,而且步骤繁多,如果只是想测试一下:
首先制作配置文件,如下;
oscc_01 /oscc/WES/alignment/OSCC_01_N_recal.bam /oscc/WES/alignment/OSCC_01_T_recal.bam oscc_04 /oscc/WES/alignment/OSCC_04_N_recal.bam /oscc/WES/alignment/OSCC_04_T_recal.bam oscc_06 /oscc/WES/alignment/OSCC_06_N_recal.bam /oscc/WES/alignment/OSCC_06_T_recal.bam oscc_09 /oscc/WES/alignment/OSCC_09_N_recal.bam /oscc/WES/alignment/OSCC_09_T_recal.bam oscc_10 /oscc/WES/alignment/OSCC_10_N_recal.bam /oscc/WES/alignment/OSCC_10_T_recal.bam oscc_11 /oscc/WES/alignment/OSCC_11_N_recal.bam /oscc/WES/alignment/OSCC_11_T_recal.bam oscc_13 /oscc/WES/alignment/OSCC_13_N_recal.bam /oscc/WES/alignment/OSCC_13_T_recal.bam oscc_14 /oscc/WES/alignment/OSCC_14_N_recal.bam /oscc/WES/alignment/OSCC_14_T_recal.bam oscc_15 /oscc/WES/alignment/OSCC_15_N_recal.bam /oscc/WES/alignment/OSCC_15_T_recal.bam oscc_16 /oscc/WES/alignment/OSCC_16_N_recal.bam /oscc/WES/alignment/OSCC_16_T_recal.bam
需要根据以往的教程安装好GATK并且下载好配套文件。
然后运行下面的代码:
module load java/1.8.0_91
GENOME=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/Homo_sapiens_assembly38.fasta
GATK=/home/jianmingzeng/biosoft/GATK/gatk-4.0.3.0/gatk
DBSNP=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/dbsnp_146.hg38.vcf.gz
reference=/home/jianmingzeng/biosoft/GATK/resources/bundle/hg38/Homo_sapiens_assembly38.fasta
cat $config_file |while read id
do
arr=($id)
normal_bam=${arr[1]}
tumor_bam=${arr[2]}
sample=${arr[0]}
start=$(date +%s.%N)
echo Mutect2 `date`
time $GATK --java-options "-Xmx10G -Djava.io.tmpdir=./" Mutect2 -R $reference \
-I $tumor_bam -tumor $(basename "$tumor_bam" _recal.bam) \
-I $normal_bam -normal $(basename "$normal_bam" _recal.bam) \
-O ${sample}_mutect2.vcf
$GATK FilterMutectCalls -V ${sample}_mutect2.vcf -O ${sample}_somatic.vcf
echo Mutect2 `date`
dur=$(echo "$(date +%s.%N) - $start" | bc)
printf "Execution time for Mutect2 : %.6f seconds" $dur
echo
done
简单过滤
for i in *_somatic.vcf do j=$(basename "$i" _somatic.vcf ) echo $j cat $i | perl -alne '{if(/^#/){print}else{next unless $F[6] eq ".";next if $F[0] =~/_/;print } }' > ${j}_filter.vcf done
把vcf文件转为maf文件,需要参考我在生信菜鸟团前面的博客
cat config |while read id
do
arr=($id)
normal_bam=${arr[1]}
tumor_bam=${arr[2]}
sample=${arr[0]}
perl ~/biosoft/vcf2maf/vcf2maf.pl --input-vcf ${sample}_filter.vcf --output-maf ${sample}.maf \
--ref-fasta ~/.vep/homo_sapiens/86_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz \
--tumor-id $(basename "$tumor_bam" _recal.bam) --normal-id $(basename "$normal_bam" _recal.bam) --ncbi-build GRCh38
done
得到的maf就可以用maftools去可视化啦!