Mismatch in the read count/Allele count in bamout and g.vcf files and other bam files

AswathyNAswathyN BangaloreMember

Hello,
I am using GATK pipeline for variant calling in Targeted Exome samples.
While trying to view the variants in IGV at different levels of variant calling steps (aligning .bam files generated at BaseRecalibration and HaplotypeCaller steps against reference.fasta), i find that the total number of reads aligning at a particular variant location varies. ie.
after BaseRecalibration and before HaplotypeCalling total read count at a variant location (as per IGV- BR.bam alignment) is 274. It shows A count is 1 and C count is 273. But after HaplotypeCaller step, The total read count is 460 (as per IGV-bamout alignment). It also shows that A count is 195 and C count is 265. At the same time the variant record from the corresponding g.vcf file shows something like this:

17 48253171 . C A,<NON_REF> 1802.77 . BaseQRankSum=4.848;ClippingRankSum=-1.291;DP=388;MLEAC=1,0;MLEAF=0.500,0.00;MQ=36.72;MQ0=0;MQRankSum=-14.293;ReadPosRankSum=-14.235 GT:AD:DP:GQ:PL:SB 0/1:217,104,0:321:99:1831,0,6990,2478,7307,9785:217,0,104,0

I understand that HC reassembles the active regions. But then also I see a drastic different in the read/Allele counts.
My questions are:
1. Why the total read count(even Alternate Allele count) is increased suddenly in HC step compared to the previous step(s)?
2. Why the Allele counts and Read depth in the g.vcf file and that in the corresponding bamout files (IGV view) are different?
3. In the above situation, which allele count should be considered bamout(ie IGV) or g.vcf counts?

Thanking you in advance,
Aswathy

Tagged:

Best Answer

Answers

  • AswathyNAswathyN BangaloreMember

    Hello Geraldine,
    I just want to be double sure..
    You mean to say that though the read count varies in the bamout file (HC step) and the corresponding g.vcf file, we should consider the read count given in the g.vcf only?
    In these cases how do we visualize the actual variants?

    Thanks in advance,
    Aswathy

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @AswathyN
    Hi Aswathy,

    We realize the difference is inconvenient, but yes, for now please use the GVCF values. One of the developers is working on a feature that allows the reads that are not used by HaplotypeCaller to be flagged in IGV. This will allow the bamout file to more accurately reflect the GVCF.

    The feature will most probably be available in the next release.

    -Sheila

Sign In or Register to comment.