Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

CombineVariants: GT field is not updated when merging variants with different ALT alleles

sbahetisbaheti Posts: 7Member
edited September 2013 in Ask the team

HI

my two VCF files have different alternate allele at the same position as i have called the variants using two different callers. When i run combine Variants on the both my GT field is not updated properly. AD field is also not updated properly but i ran Variant Annotator and that fixes that issue.

VCF 1:

chr1    87708015    rs58006838  C   T   145.77  .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.479;DB;DP=14;Dels=0.00;FS=0.000;HaplotypeScore=3.9299;MLEAC=1;MLEAF=0.500;MQ=68.54;MQ0=0;MQRankSum=-1.109;QD=10.41;ReadPosRankSum=0.925  GT:AD:DP:GQ:PL  0/1:3,9:14:37:174,0,37

VCF 2:

chr1    87708015    .   C   A   .   PASS    AN=2;DP=4;NS=1  GT:AD:DP:GQ 0/1:2,2:4:8.42

Merged VCF file:

chr1    87708015    rs58006838  C   T,A 145.77  PASS    AC=1,0;AF=0.500,0.00;AN=2;BaseQRankSum=-1.479;DB;DP=18;Dels=0.00;FS=0.000;HaplotypeScore=3.9299;MLEAC=1;MLEAF=0.500;MQ=68.54;MQ0=0;MQRankSum=-1.109;NS=1;QD=10.41;ReadPosRankSum=0.925;set=Intersection GT:AD:DP:GQ **0/1**:3,9,2:14:37

command used:

java -jar $gatk/2.7-1-g42d771f/GenomeAnalysisTK.jar -T CombineVariants -V one.vcf.gz -V two.vcf.gz -o test.vcf -R $ref

Is this a known limitation or a bug?

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    It looks to me like this is working properly. It's a heterozygous site in both cases. What do you think should be different?

    Geraldine Van der Auwera, PhD

  • sbahetisbaheti Posts: 7Member

    output VCF it is not following the VCF 4.1 format and is not a valid variant according to GATK ValidateVariants walker, possible genotype values for this multi allelic variants is 0/2 1/2 2/2 right ??

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Why would 0/1 not be correct? The sample is heterozygous, and the first of the ALT alleles is chosen (you can change that by specifying a different merge option if you want).

    ValidateVariants is saying that it is not valid? Can you post the output?

    Geraldine Van der Auwera, PhD

  • sbahetisbaheti Posts: 7Member
    edited September 2013

    Here is the command and the error java -jar 2.7-1-g42d771f/GenomeAnalysisTK.jar -T ValidateVariants -V variants.vcf.gz -R $ref

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.7-1-g42d771f):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: File /data2/bsi/secondary/Kocher_Jean-Pierre_m026645/whole_genome/simulated_normal_8lanes/.tmp/s_normal_8l/variant/chr1_old/variants.vcf.gz fails strict validation: one or more of the ALT allele(s) for the record at position chr1:87708015 are not observed at all in the sample genotypes

    VCF line for the specific line is here :

    zcat variants.vcf.gz | grep 87708015 chr1 87708015 rs58006838 C T,A 145.77 PASS AC=1,0;AF=0.500,0.00;BaseQRankSum=-1.664;DB;DP=18;Dels=0.00;FS=0.000;HaplotypeScore=3.9299;MLEAC=1;MLEAF=0.500;MQ=68.54;MQ0=0;MQRankSum=-0.555;NS=1;QD=10.41;ReadPosRankSum=0.925;set=Intersection;ED=11 GT:AD:DP:GQ:SET 0/1:3,9,2:14:37:Intersection

    Thanks

    Saurabh

    Post edited by sbaheti on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Ah, I see. In the strictest sense it's true that it's an issue because you have two ALT alleles but only one sample. But the genotype itself is properly expressed. Either you use a different merge option so that the second ALT allele is discarded, or you ignore the validation error (because it's not a big problem).

    Geraldine Van der Auwera, PhD

  • sbahetisbaheti Posts: 7Member

    Thanks for tour input, I tried going through the GATK documentation but didn't find the parameter which will allow me to get rid of second allele. Could you let me know how can i do that.

Sign In or Register to comment.