HaplotypeCaller can't call a 10bp deletion variant

ShawnWilliamsShawnWilliams Wuhan City. Hubei Province. ChinaMember


Hi, GATK team.
I use haplotypeCaller to call variants, but it can't find a 10bp deletion variant, as you can see in the graph.
I use -L targetInterval
-bamWriterType ALL_POSSIBLE_HAPLOTYPES
-bamout haplotype.bam
to see whether haplotype is correctly assembled, but the haplotype.bam is empty, it seems the targetInterval is not active region.

Then I use -forceActive, the haplotype.bam is not empty anymore, but the output vcf file still doesn't contain the 10bp deletion variant, so I'm really confused now, what should I do to call this variant out.
I use gatk3.6.

the 10 bp deletion's position is chr17:29541466.
Here is my pipeline:
java -d64 -server -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Djava.io.tmpdir=$tmp_dir -jar $gatk \
-R $reference_file \
-L 17:29,541,000-29,542,000 \
-bamWriterType ALL_POSSIBLE_HAPLOTYPES \
-bamout test.bam\
-T HaplotypeCaller \
-I $in_dir/NA12878MOD_sort_markdup_vardict_sort_realign_recal.bam \
--dbsnp $dbsnp_del100 \
-forceActive \
-o $out_dir/test.raw.snps.indels.vcf

Answers

  • ShawnWilliamsShawnWilliams Wuhan City. Hubei Province. ChinaMember

    Hi @EADG,
    Thanks for your help, I read these threads, and I find the mapping quality of many reads which have a 10 bp deletion are 19, and HCMappingQualityFilter will filter out those reads which mapping quality less than 20, so most reads which contains variants are filtered, so this interval has few variants. Is this the reason of why this area is not active region?
    And without -forceActive, my bam.out is empty, but if I add -mmq 10, my bam.out contains many haplotypes, and some haplotypes contains the 10 bp deletion variant. The picture is generated with -mmq 10. but my output vcf is still empty.

  • ShawnWilliamsShawnWilliams Wuhan City. Hubei Province. ChinaMember

    Hi, GATK team.
    I have some new found and new problems.
    If I use -bamWriterType ALL_POSSIBLE_HAPLOTYPES, my bam.out contains many reads.
    but if I change this argument to -bamWriterType CALLED_HAPLOTYPES, my bam.out is empty.
    I think the problem is the possible haplotype is filtered in some step. If my thought is right, could you tell me what step filter the possible haplotypes?

    And a strange thing is when I use -mmq 20(default), my bam.out is empty, the log shows that 48 reads (8.03% of total) failing HCMappingQualityFilter, and bam.out is empty, and vcf file is empty too. But if I use -mmq 25, the log shows that 74 reads (12.37% of total) failing HCMappingQualityFilter, and bam.out contains haplotypes, and vcf file contains the 10 bp deletion variant! I thought that when I increase -mmq argument, more reads are filtered, the number of haplotyope should decrease, but the fact is I get the haplotype and the variant with mmq 25 rather than 20.(the highest mapping quality of the read contains the variant is 29. If I increase -mmq to 30, the haplotype and variant can't be found.)

    Another problem is the AF value of the variant is 0.500, the true AF should be about 0.20, and I find the AF value of other variants in my output.vcf is almost 1.000 or 0.500, they are quite from the benchmark. I want to know whether there is a tool of GATK or any argument of haplotypeCaller can be used to recalibrate the AF value of variants.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @ShawnWilliams
    Hi,

    So, even lowering the mapping quality threshold does not help with emitting the call? Can you try the latest version?

    Also, it looks like there are a lot of repeats in the region. You can try some of the advanced arguments listed at the bottom of this article.

    Do you know if the indel is a true variant? You can try searching for other regions in the genome that are homologous, as it is possible this is a mapping artifact.

    -Sheila

  • ShawnWilliamsShawnWilliams Wuhan City. Hubei Province. ChinaMember

    Hi, @Sheila
    Thanks for your reply.
    I have tried gatk3.8, but lowering mapping quality threshold still doen't help.
    And my problem is that I can call the variant with -mmq 25, but -mmq 20 doesn't work out. I want to know the reason.

    There is a benchmark shows the indel is a true variant, and I have cost much time to confirm this is a true variant.

    And could you tell me how to recalibrate the AF value of the variants, because I found the AF value of variant called by haplotypeCaller are all 1 or 0.500, this is very different to the benchmark.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited October 2017

    @ShawnWilliams
    Hi,

    The mmq issue is baffling, but can you try with GATK4 latest beta? There won't be any more work done on GATK3, so I hope this issue is not present in GATK4.

    As for the AF value, that is just the way the calculation is done by HaplotypeCaller (it takes into account how many variant alleles are present in the genotypes, not how many reads contribute to the result). If you would like to calculate the AF using the reads depth, you will have to calculate it using the AD field. Have a look at this thread for more information.

    -Sheila

Sign In or Register to comment.