If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
A problem about FILTER flag "alignment": filtered some TP variant(sanger varified)
I test my GATK22.214.171.124 mutect2 best practice flow with gastric cancer wes pair data(Normal-N990005/Tumor-T990005) from the paper(2012,Nature Genetics,PMID:22484628) .
I download the data from here:
then this paper give 8 SNV(actully 9,filtered 1 ) results and 2 indels (actully 3,filtered 1),which are all verified by sanger.
they do bwa alignment by hg18,so I convert to hg19.
I find 2 of this records are filtered by "alignment":
chr15 79056976 . G A . alignment CONTQ=93;DP=59;ECNT=1;GERMQ=93;MBQ=32,33;MFRL=153,141;MMQ=60,60;MPOS=4;NALOD=1.60;NLOD=11.03;POPAF=6.00;RCNTS=0,4;ROQ=45;SEQQ=76;STRANDQ=72;TLOD=14.19 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:37,0:0.025:37:16,0:19,0:18,19,0,0 0/1:11,7:0.394:18:5,3:5,3:5,6,3,4
chrX 101971433 . C T . alignment CONTQ=93;DP=545;ECNT=2;GERMQ=93;MBQ=30,33;MFRL=121,148;MMQ=60,60;MPOS=18;NALOD=2.41;NLOD=75.78;POPAF=3.19;RCNTS=0,4;ROQ=90;SEQQ=93;STRANDQ=93;TLOD=112.62 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|0:252,0:3.892e-03:252:130,0:107,0:0|1:101971433_C_T:101971433:126,126,0,0 0|1:236,41:0.150:277:127,24:100,16:0|1:101971433_C_T:101971433:137,99,24,17
My filterAlignmentArtifact command is as below:
gatk FilterAlignmentArtifacts -V gatk_mutect2/S086/S086.m2_oncefilt.vcf.gz -I gatk_mutect2/S086/S086.m2.sort.bam --bwa-mem-index-image gatk_db_hg38/Homo_sapiens_assembly38.fasta.img -O gatk_mutect2/S086/S086.m2_twicefilt.vcf.gz
and my whole mutect2 flow can find here:
So is this the BUG of filterAlignmentArtifact , or the expected sacrifice in order to drop FP variant?
I'm very appreciate if anyone can give me some advice.Thanks a lot.
by the way. I compare the result of this paper reported at 2012 and my GATK4.1 flow:
I check 119 of GATK4.1 unique variant,average AF is 10% and average depth is 210X, which seems very good.
So I think perhaps these variants are due to hg18 reference,is it possible?