HaplotypeCaller does not call variant depending on the interval list I use.

I am trying to explain why HaplotypeCaller calls a variant in some cases but not in others. I notice that a FN had very good coverage (>2000x) but wasn't being called. I ran HaplotypeCaller using an interval list of just that site, and the variant was emitted. I increased the size of the surrounding interval (10bp, 50bp, 100bp on either side), and the variant was still called. But if the interval gets too big (1000bp or larger), the gVCF does not have a call at the site:

chr   pos .   C      .   .   END= GT:DP:GQ:MIN_DP:PL  0/0:2198:0:2192:0,0,0

I am curious why the PL's are all zero too!

The output for this site when I use the smaller interval lists is:

chr20   pos .   C   T,A,   36264.77    .   BaseQRankSum=8.793;ClippingRankSum=-0.574;DP=2068;ExcessHet=3.0103;MLEAC=1,0,0;MLEAF=0.500,0.00,0.00;MQRankSum=1.003;RAW_MQ=7438970.00;ReadPosRankSum=-0.478    GT:AD:DP:GQ:PL:SB   0/1:1009,994,1,0:2004:99:36293,0,32500,39597,33950,77303,39022,33568,73438,72576:402,607,486,509

Here's my full command:

java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4096m -jar /path/to/GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R /path/to/hs38DH.fa \
    --minPruning 3 --maxNumHaplotypesInPopulation 200 -variant_index_parameter 128000 -variant_index_type LINEAR --emitRefConfidence GVCF --max_alternate_alleles 3 --contamination_fraction_to_filter 0.0 \
    -I /path/to/bam \
    -L /path/to/interval/list \
    -o /path/to/g.vcf \
    -pairHMM VECTOR_LOGLESS_CACHING   

GATK version: version 3.5-0-g36282e4

Note that running in -genotyping_mode GENOTYPE_GIVEN_ALLELES calls it correctly.

Thanks for any help you could offer.

Issue · Github
by Sheila

Issue Number
1361
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • shuangBroadshuangBroad Broad75Member, Broadie, Dev

    Hi Nills, when you run HC, did you see a warning about alt allele removal in the following format?

    Removed alt alleles where ploidy is %d and original allele count is %d, whereas after trimming the allele count becomes %d. Alleles kept are:%s
    

    I am trying to fix a bug that might be related to this, and if you do see such a warning, that bug might be the cause.

    If you cannot see the logs for some reason, is this a pooled experiment where both ploidy and # of alt alleles would be high?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @nilshomer
    Hi Nils,

    Can you post the BAM file and bamout file of the region? Please post screenshots of when the variant is called and not called.

    Thanks,
    Sheila

  • nilshomernilshomer Boston, MAMember

    Unfortunately I don't have access to the data anymore. I did see this enough on some NA12878 data that you may want to check out some of your false negatives relative to the NIST and Platinum Genomes data.

Sign In or Register to comment.