Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CombineVariants results in completely wrong genotype fields in multithreaded mode; a bug?

Dear GATK team,

I spotted this behavior a while ago (I think in the pre-release, beta version) and unfortunately thought it would be fixed. But it still seems to be there.
When you combine 2 vcf files with CombineVariants in a multithreaded mode for some variants it produces totally wrong numbers in the AD,DP,GQ fields.
Way off from real numbers in initial vcfs, sometimes going into thousands instead of tens!
It works correctly in single thread mode...

It seems there is a serious bug somewhere. Not all variants are affected, but you can do diff on two files made with 1 thread and multiple thread mode.

Thank you.

Answers

  • igcocoleigcocole Member

    It looks as the SelectVariants might also be effected in -nt mode? There is another recent post on SelectVariants issues in output it seems.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for pointing this out, we're looking into it for both tools.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    We're unable to reproduce this behavior; in our hands these tools work perfectly fine in multithreaded mode. It looks like it might be an issue with your filesystem not handling the multithreading operations properly. We're looking at ways to test for this problem in order to be able to issue a warning to users when this happens, but we unfortunately don't foresee being able to fix it. At this point all we can say is that if you're experiencing this problem you should run the tools without -nt.

    If anyone else experiences these issues let us know. If we get more cases we may able to find out what they have in common and pinpoint the precipitating conditions.

  • igcocoleigcocole Member

    Hi,

    I've seen this behavior in different (physically) filesystems and different computers. With different datasets.
    I can try and see if the local hard drives behave in same way. But it seems strange to be a filesystem issue...

    Did you try doing it on relatively large multisample vcf files? I think only 1% of SNPs turn out like this.
    I'm not using the -nt option because of this, but I think this can cause very serious errors/issues for users not aware of such thing.

    Unfortunately I cannot share the files I saw the issue appear recently. But I'll try to find ones that I can share and reproduce.

    Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We routinely run these tools on very large/multisample datasets with -nt and haven't seen anything like this, and we do have systematic tests that would catch these problems if they were occurring with our data. But if you can narrow down data/conditions to reproduce the issue we're more than happy to look at them. We share your concern for users who are not aware of this potential issue, which is why we're looking at ways to at least detect the problem, even if we can't fix it, because we want to be able to issue a warning.

Sign In or Register to comment.