Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

MergeVcfs - elements of the input Iterators are not sorted according to the comparator

I'm trying to merge VCFs from 86 HaplotypeCaller jobs with GATK MergeVcfs and getting an error:

2018-07-03T13:58:06.462132033Z java.lang.IllegalStateException: The elements of the input Iterators are not sorted according to the comparator htsjdk.variant.variantcontext.VariantContextComparator

VCFs passed validation (GATK ValidateVariants). GATK 4.0.2.0 version was used. GSNAP is used for the alignment.
Here is the command line (majority of VCFs is removed to make the preview shorter and more readable):

/opt/gatk --java-options "-Xmx2048M" MergeVcfs --OUTPUT WES_human_Illumina.pe_.filtered.sorted.vc.vcf --INPUT tasks/cf6dc246-c79d-4c54-8a72-0be160a50b62/vc_GATK_HaplotypeCaller_0_s/WES_human_Illumina.pe_.filtered.sorted.vcf --INPUT tasks/cf6dc246-c79d-4c54-8a72-0be160a50b62/vc_GATK_HaplotypeCaller_1_s/WES_human_Illumina.pe_.filtered.sorted.vcf --INPUT tasks/cf6dc246-c79d-4c54-8a72-0be160a50b62/vc_GATK_HaplotypeCaller_2_s/WES_human_Illumina.pe_.filtered.sorted.vcf --REFERENCE_SEQUENCE /sbgenomics/workspaces/ac2e7439-25bf-4c9f-bd35-2a29e376d2b6/tasks/cf6dc246-c79d-4c54-8a72-0be160a50b62/vc_SBG_FASTA_Indices/Homo_sapiens_assembly38.fasta

Can you tell me why is this exception thrown and how can I mitigate it?

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @Vladimir_Kovacevic,

    First, please do not post duplicate questions! I see you post this exact same question at https://gatkforums.broadinstitute.org/gatk/discussion/10328/combinevariants-in-gatk4.

    There is something about your variant calls and how they are formatted that is triggering the error.

    For reference, the error comes from https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/VariantContextComparator.html.

    MergeVCfs is particular about the inputs. To quote the tool doc:

    The input variant data must adhere to the following rules:

    If there are samples, those must be the same across all input files.
    Input file headers must be contain compatible declarations for common annotations (INFO, FORMAT fields) and filters.
    Input files variant records must be sorted by their contig and position following the sequence dictionary provided or the header contig list.

    One way to ensure comparability is to sort your VCFs while specifying the same sequence dictionary with -SD.

    merge VCFs from 86 HaplotypeCaller jobs

    If your HaplotypeCaller jobs are scattered across intervals for the same set of samples then MergeVcfs will work. However, if your HaplotypeCaller jobs are each on a different sample, then you cannot use MergeVcfs. In this case, I believe you use CombineVariants.

  • Hi @shlee ,
    thank you for your answer. Trust me, every day I work with many different tools, variant calls and sources, I could not remember all of the error I got. So, I did not write the same question on purpose. Please, excuse me :smile:
    The samples in all VCFs are the same.

Sign In or Register to comment.