If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Weird observation in FORMAT column DP from Unified Genotyper
I am trying Unified Genotyper on a 454 data set combined with some artificial data. The bam file for this data has 12 aligned reads spanning the locus: Chr17:7578406. These 12 reads are obtained from real 454 run.
On executing Unified Genotyper with the following options:
java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.2-15-g9214b2f/GenomeAnalysisTK.jar -R ~/CGP/ReferenceSeq/fromGATK/bundle1.5/b37/human_g1k_v37.fasta -T UnifiedGenotyper -I NG.CL316.ordered.sorted.bam -o NG.1x.CL316.vcf --dbsnp ~/CGP/ReferenceSeq/fromGATK/bundle1.5/b37/dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0.0 -dcov 5000 -L "17:7,578,400-7,578,410"
I get the following result:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT z 17 7578406 rs28934578 C T 83.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.529;DB;DP=12;Dels=0.00;FS=0.000;HaplotypeScore=2.7884;MLEAC=1;MLEAF=0.500;MQ=38.06;MQ0=0;MQRankSum=0.529;QD=6.98;ReadPosRankSum=2.367 GT:AD:DP:GQ:PL 0/1:7,5:12:99:112,0,195
Then I added a copy of same 12 reads (after modifying the names of the added reads to give each added read a unique name) to the above bam file. I have ensured that the names are unique in the bam file. Then again I run UnifiedGenotyper and get the following result:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT z 17 7578406 rs28934578 C T 165.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.966;DB;DP=24;Dels=0.00;FS=1.885;HaplotypeScore=5.5768;MLEAC=1;MLEAF=0.500;MQ=38.06;MQ0=0;MQRankSum=0.205;QD=6.91;ReadPosRankSum=3.367 GT:AD:DP:GQ:PL 0/1:14,10:22:99:194,0,363
I want to point out that in the first run, I see that INFO and FORMAT column's have DP=12 and AD of 7,5 add up indicating that no read for filtered for making the variant call. However in the second run, even though the 12 extra reads are exactly the same as the earlier 12 reads, we see that while the INFO column DP=24, the FORMAT column DP is 22. Which indicates that 2 reads have been filtered while making the variant call.
Also please note that I tried adding the 12 read incrementally. And observed that till adding the 7th read, the DPs were all matching up. But then when I added the 8th read, the result showed that 2 reads were filtered (from FORMAT column DP). So if only the 8th read was being filtered out, I would think it might be a problem with that read but why is an extra read which was not being filtered earlier is being filtered on addition of this read. Or is the FORMAT column DP incorrect?
The 12 aligned reads that were added to the sam file are attached. Please let me know if you would need any other info.
Thanks for any insight in this issue.