The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Possible LeftAlignVariants bug for multi-allelic indel variant

We have a complex VCF record that doesn't appear to be properly treated by LeftAlignVariants, and I couldn't find evidence that this behavior has been reported anywhere else.

The record is

17      19561175        .       GGTTTGT G,GTTTGT        49      PASS    AC=1,1;AF=0.50,0.50;AN=2;DP=117;DS;MQ=60;MQ0=0;source=Locus     GT:AB:AD:DP     1/2:0.925:68,49:117

Admittedly, the GGTTTGT>GTTTGT variation is odd because it's better specified as GG>G, but there's nothing semantically wrong with this record as written. (if you're wondering where this came from, it came from simulated data)

The challenge however is that the left-aligned version of this variant is AG>G. So I could see expecting the following output from LeftAlignVariants:

17      19561174        .       AGGTTTGT AG,AGTTTGT

Rather ugly, but I think that's the right way to write the original complex variation after left-alignment. Alternatively a separated and phased representation would achieve the same:

17      19561174        .       AG  A             .... 0/1:...
17      19561175        .       GGTTTGT G   .... 1|0:...

But I bet that would introduce all types of problems in LeftAlignVariants if you tried to make that happen.

I think it's a really hard problem to solve in the main, just wanted to post here to see if 1. You agree that it behaves this way and 2. Help anyone else who might be seeing something like this.

Best Answer


  • As a followup to this, I would find it useful for LeftAlignVariants to left align multi-allelic indels, but I have found that this function does not maintain the order of the variants-

    for example a call like:

    java -jar GenomeAnalysisTK.jar -R all.fa -T LeftAlignAndTrimVariants --variant multi_indel.vcf --splitMultiallelics --trimAlleles -o multi_indel_left.vcf

    takes input like:

    chr1 8598 . T - TAA,TA

    and outputs:

    chr1 8598 . T - TA

    chr1 8598 . T - TAA

    where the order of the variants is reversed. Now I know that this is an example that was not left corrected, but I see this behavior in left corrected variants as well, and it seems to be random as to which variants get reversed. Is there a way for this function to maintain the order of multi-allelic variants so that I can keep track of them?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I'm not sure I understand what you mean by maintaining the order -- are you saying that if you have

    T - TAA, TA

    you want it to always output

    T - TAA
    T - TA

    ? I'd have to check the code to see if there is a rationale for outputting one or the other first, but I'm going to guess it's just related to the type of data structure we're using to store the alleles and how we're retrieving them. Are you sure the ordering is random, not alphabetical ordering?

Sign In or Register to comment.