Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Mismatch in the read count/Allele count in bamout and g.vcf files and other bam files
I am using GATK pipeline for variant calling in Targeted Exome samples.
While trying to view the variants in IGV at different levels of variant calling steps (aligning .bam files generated at BaseRecalibration and HaplotypeCaller steps against reference.fasta), i find that the total number of reads aligning at a particular variant location varies. ie.
after BaseRecalibration and before HaplotypeCalling total read count at a variant location (as per IGV- BR.bam alignment) is 274. It shows A count is 1 and C count is 273. But after HaplotypeCaller step, The total read count is 460 (as per IGV-bamout alignment). It also shows that A count is 195 and C count is 265. At the same time the variant record from the corresponding g.vcf file shows something like this:
17 48253171 . C A,<NON_REF> 1802.77 . BaseQRankSum=4.848;ClippingRankSum=-1.291;DP=388;MLEAC=1,0;MLEAF=0.500,0.00;MQ=36.72;MQ0=0;MQRankSum=-14.293;ReadPosRankSum=-14.235 GT:AD:DP:GQ:PL:SB 0/1:217,104,0:321:99:1831,0,6990,2478,7307,9785:217,0,104,0
I understand that HC reassembles the active regions. But then also I see a drastic different in the read/Allele counts.
My questions are:
1. Why the total read count(even Alternate Allele count) is increased suddenly in HC step compared to the previous step(s)?
2. Why the Allele counts and Read depth in the g.vcf file and that in the corresponding bamout files (IGV view) are different?
3. In the above situation, which allele count should be considered bamout(ie IGV) or g.vcf counts?
Thanking you in advance,