Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Variant Caller and BAM file INFO not matching

nansnans Member
edited July 31 in Ask the GATK team

Hello,
I am using gatk-4.0.3.0 following the Best practices.

BWA-> Picard Mark Dup -> GATK Base Recalib/Apply -> HC variant calling per sample

For one of our samples, we have a variant called at 1:201021671:C/G The HC has the following info (relevant to us)

INFO-DP:569 FORMAT-AD:422,139 FORMAT-DP: 561 GQ:99

1 201021671 . C G 1543.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-13.710;ClippingRankSum=0.000;DP=569;ExcessHet=3.0103;FS=305.296;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=2.75;ReadPosRankSum=-4.819;SOR=9.883 GT:AD:DP:GQ:PL 0/1:422,139:561:99:1572,0,21081

But the BAM file in IGV shows something different for this variant

Total Count - 590,
AD Count - 574,11

Where does HC get a higher alternate count ? and why is the total count different ?

Thanks

Answers

  • nansnans Member
    edited July 31

    Just adding, I also did a -bamout and got different results

    1 201021671 . C G 287.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-11.107;ClippingRankSum=0.000;DP=511;ExcessHet=3.0103;FS=229.865;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=0.57;ReadPosRankSum=-5.668;SOR=8.885 GT:AD:DP:GQ:PL 0/1:419,84:503:99:316,0,16156

    Info: this is targeted gene panel sequencing on 154 genes using Illumina's Nextera Rapid Capture custom panel

    Post edited by nans on
  • bshifawbshifaw Member, Broadie, Moderator admin

    In IGV, try going back to the default view so that you are viewing each aligned read. Then select Color alignments by and choose read group. Your gray reads should now be colored. Next select Group alignments by and choose read group. Click on the variant site and post a screenshot of the view.

  • nansnans Member

    Hi @bshifaw

    Please find attached IGV screenshot

    Also, I did run a bam-readcount on the bam file and the output matches what shows on the IGV

  • nansnans Member

    any suggestions GATK Team ?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Confirm there wasn't anything changed in the command to produce the bamout other than the addition of --bamout. The bamout parameter should just provide the locally realigned bam output, not alter the VCF annotations.

    Here is some info on AD and DP. AD is unfiltered while DP is filtered.

    AD and DP : Allele depth and depth of coverage.
    These are complementary fields that represent two important ways of thinking about the depth of the data for this sample at this site.
    AD is the unfiltered allele depth, i.e. the number of reads that support each of the reported alleles. All reads at the position (including reads that did not pass the variant caller’s filters) are included in this number, except reads that were considered uninformative. Reads are considered uninformative when they do not provide enough statistical evidence to support one allele over another.
    DP is the filtered depth, at the sample level. This gives you the number of filtered reads that support each of the reported alleles. You can check the variant caller’s documentation to see which filters are applied by default. Only reads that passed the variant caller’s filters are included in this number. However, unlike the AD calculation, uninformative reads are included in DP.
    See the Tool Documentation for more details on AD (DepthPerAlleleBySample) and DP (Coverage) for more details.

  • nansnans Member
    edited August 6

    @bshifaw

    the bamout was generated using the same commands.

    My query is not why AD and DP are different. My question is why does GATK have an allele count of 139 for G and why is IGV and samtools calling it 11 only ?

    Post edited by nans on
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited August 7

    That's no good, the results shouldn't be different after adding the -bamout argument, this might be a bug in the earlier version. Switch to the latest version if you can.

    Just want to confirm your question before consulting my team, I'm going to focus on the posted bamout results because that's what HC uses to make calls. Here is a summary of the posted bamout annotations above.

    HaplotypeCaller
    GT: 0/1
    AD: 419(C),84(G)
    DP: 503
    GQ: 99
    PL: 316,0,16156

    IGV
    Total: 561
    C: 500
    G: 58

    So is the question why is HC 84(G) greater than IGV's 58(G)?

  • nansnans Member
    edited August 7

    @bshifaw
    Updating to a newer version is not that simple as we would have to undergo verification and validation of the tool which is a separate process itself. having used this version of the tool on several 1000s of samples, this is the first time I have encountered this issue and would like to know why

    The question being asked is correct but the metrics are wrong

    HaplotypeCaller
    GT: 0/1
    AD: 419(C),84(G)
    DP: 503
    GQ: 99
    PL: 316,0,16156

    IGV
    Total: 590
    C: 574
    G: 11

    So the question why is HC 84(G) greater than IGV's 11(G)?

    Thank you

  • bshifawbshifaw Member, Broadie, Moderator admin

    @nans

    Here is a response from the dev team:

    Both HaplotypeCaller and IGV downsample reads, and I suspect that both of them are downsampling and this is resulting in different numbers

    Disabling downsampling in IGV: https://software.broadinstitute.org/software/igv/Preferences
    Disabling downsampling in HC: run it with --max-reads-per-alignment-start 0
    Rerun haplotypecaller with that parameter and set IGV to not downsample. After that if the numbers aren't the same let us know.

  • 29043594952904359495 Member
    edited August 8

    I have tested that, still not macth.
    IGV 2.5.0, gatk4.0.11.0, followed the igv and mutect2 settings



    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar gatk-package-4.0.11.0-local.jar Mutect2 --intervals xx.bed --native-pair-hmm-threads 8 --pair-hmm-implementation AVX_LOGLESS_CACHING_OMP --input xx.bam --output xx.vcf --reference ucsc.hg19.fasta --tumor-sample yy --panel-of-normals zz.vcf -bamout xxdownsample.bam --max-reads-per-alignment-start 0
    br>





    chr7 55248998 . A ATGGCCAGCG . . DP=5189;ECNT=2;POP_AF=5.000e-08;RPA=1,2;RU=TGGCCAGCG;STR;TLOD=125.89 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:5114,60:0.012:5174:0,0:5114,60:32,32:175,175:60:50:0:0.010,0.010,0.012:6.572e-04,6.857e-04,0.999

    Post edited by 2904359495 on
  • nansnans Member
    edited August 8

    @bshifaw
    Do the downsampling option in IGV was always disabled in the screenshots provided above.

    I re-ran HC with the suggestion option

    $gatk --java-options "-Xmx4g" HaplotypeCaller \
               -R $ref --dbsnp GATK_ref/dbsnp_138.b37.vcf \
                -I $file.bam \
                -stand-call-conf 30.0 --intervals $ROI.bed  \
                -L 1:201021500-201021700 -bamout $file_out.bam  \
                --max-reads-per-alignment-start 0 \
                 -O $out.g.vcf
    

    and now the variant is not picked up when I ran HC with the option --max-reads-per-alignment-start 0

    here is the screen shot of bamout in IGV

    Post edited by nans on
  • nansnans Member

    @bshifaw just wondering if we are able to resolve this issue ? Thanks

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited August 12

    @nans, we're still looking into it. Please share a snippet of your data and resource files used to run the command so that we can replicate and look into it further. Sending a Report

    Post edited by bshifaw on
  • 29043594952904359495 Member

    @nans I also post the picture showing not match, in the command log, I can see a lot saying filtering reads, maybe a little complex, haha, thanks a lot

Sign In or Register to comment.