Accurate ref/alt read counts for DNPs

mrooneymrooney Cambridge, MAMember


I am using HaplotypeCaller in "genotype_given_alleles" mode in order to obtain REF and ALT read counts for candidate variants (using the AD field). This seems to work fine for SNPs and indels; however, I seem to have trouble with DNPs (e.g. REF=CC,ALT=AT), which always get assigned a variant read count of zero (e.g. "GT:AD:DP:GQ:PL 0/0:331,0:331:99:0,1072,2147483647". When I look at the HC-generated bam in a viewer, the variant reads are clearly present in abundance. So the read stats seem to be wrong.

Is this expected behavior? If not, could you recommends steps/checks to figure this out?

I have attached my HC parameters, a list of some DNPs that were missed, and a screen shot of the first variant.


  • mrooneymrooney Cambridge, MAMember
    I forgot to mention that I am using GATK 3.5
  • mrooneymrooney Cambridge, MAMember

    And here is the command line: java -Xmx18000M -jar /opt/GenomeAnalysisTK_3.5-0-g36282e4.jar --analysis_type HaplotypeCaller --out HC.vcf -bamout output_HC.bam --bamWriterType ALL_POSSIBLE_HAPLOTYPES --standard_min_confidence_threshold_for_emitting 20 --standard_min_confidence_threshold_for_calling 20 --reference_sequence human.fasta --input_file input.bam --dontUseSoftClippedBases --dontTrimActiveRegions --intervals alleles.vcf --interval_padding 500 --genotyping_mode GENOTYPE_GIVEN_ALLELES --gatk_key gatk.key --forceActive --disableOptimizations --dbsnp sbsnp.vcf --alleles alleles.vcf

  • mrooneymrooney Cambridge, MAMember

    I came across a post indicating that HaplotypeCaller cannot call DNPs (rather it would call two SNPs, which could be phased with other tools). Should I take this to mean that HaplotypeCaller also cannot handle DNPs in genotype_given_alleles mode? If so, does this imply that HaplotypeCaller will have issues with other complex variants (e.g. REF=CC,ALT=A) that cannot be represented as a simple indel or SNP?

  • mrooneymrooney Cambridge, MAMember
