Error in BeagleOutputToVCF

hirokomatsuihirokomatsui Posts: 8Member
edited January 2013 in Ask the GATK team

I'm having a problem running BeagleOutputToVCF, showing an error message:

##### ERROR stack trace
java.lang.IllegalStateException: Allele in genotype AT not in the variant context [A*, G]
        at org.broadinstitute.sting.utils.variantcontext.VariantContext.validateGenotypes(VariantContext.java:1197)
        at org.broadinstitute.sting.utils.variantcontext.VariantContext.validate(VariantContext.java:1137)
...

Is this because my Beagle output file include INDELs?

I'm using GenomeAnalysisTK-2.2-10-gbbafb72, and the command line is like:

> java -Xmx4g -jar /raid/software/src/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar \
-R hg19_lite.fa \
-T BeagleOutputToVCF \
-V chr1.01.vcf \
-beagleR2:BEAGLE beagle/chr1.01.r2 \
-beaglePhased:BEAGLE chr1.01.phased \
-beagleProbs:BEAGLE chr1.01.gprobs \
-o beagle_vcf/chr1.01.vcf \
-L 1:1-10000000

Thanks,
Hiroko Matsui

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,973Administrator, GATK Developer admin

    Hi Hiroko,

    This looks like a bug that was fixed in version 2.3. Could you please upgrade to the latest version and try again?

    Geraldine Van der Auwera, PhD

  • hirokomatsuihirokomatsui Posts: 8Member

    I see the same error with GATK version 2.3-6-gebbba25.

  • hirokomatsuihirokomatsui Posts: 8Member

    I don't know if it's related, but there's a SNP which my input VCF file says,
    chr1 10140662 rs75035497 C G

    and my gprobs file has a report at the region:
    chr1:10140662 CT C

    so does phased file say:
    M chr1:10140662 CT CT CT C CT

    At this regions, GATK show an error message:
    MESSAGE: Allele in genotype CT not in the variant context [C*, G]

    Thanks,
    Hiroko

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,973Administrator, GATK Developer admin
    edited January 2013

    Hmm, maybe. Could you please upload a snippet of your files for testing? Instructions are here if needed:

    http://www.broadinstitute.org/gatk/guide/article?id=1894

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • hirokomatsuihirokomatsui Posts: 8Member

    I have uploaded a file, hm.tar.gz, at ftp.broadinstitute.org.
    There are snippet of files, gprobs, phased, r2 and vcf in the tar ball, and the command line and the stack trace are in the file console.txt.
    Thanks for your help.
    Hiroko

  • hirokomatsuihirokomatsui Posts: 8Member

    Sorry that was my wrong, I had to remove the records from the VCF file, which I filtered out from beagle input file.
    Thank you for taking your time.
    Hiroko

  • NGCrawfordNGCrawford Posts: 2Member

    I'm getting the same error and "BeagleOutputToVCF" works correctly on all my chromosomes but one. Is there a way to get it the error message to emit the position of the offending SNV?

  • NGCrawfordNGCrawford Posts: 2Member

    Also, I'm running GenomeAnalysisTK-2.4-3-g2a7af43.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,973Administrator, GATK Developer admin

    @NGCrawford, you can append -l DEBUG to your command line to activate debug output. BTW that's lowercase L, not uppercase i.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    I'm also getting the following error msg:

    ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-gf57256b):

    ERROR MESSAGE: Key BGL_RM_WAS_ATCT found in VariantContext field FILTER at 1:86951348 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.

  • blueskypyblueskypy Posts: 228Member

    -bash-4.1$ grep '86951348' SAMPLE06_H06_pe.var.recaled.vcf

    1 86951348 . A ATCT 986.73 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=3.028;ClippingRankSum=0.155;DP=41;FS=3.358;MLEAC=1;MLEAF=0.500;MQ=55.90;MQ0=0;MQRankSum=-1.338;POSITIVE_TRAIN_SITE;QD=8.02;ReadPosRankSum=0.296;VQSLOD=2.08;culprit=FS;set=indel GT:AD:GQ:PL 0/1:21,18:99:41,40,1087

    The command I use is this:

    java -Xmx4g -jar $gatkDir/GenomeAnalysisTK.jar -T BeagleOutputToVCF -l DEBUG \
     -R $refGenome \
     -V $sampleID.var.recaled.vcf \
     -beagleR2:BEAGLE $sampleID.var.recaled.bgl.r2 \
     -beaglePhased:BEAGLE $sampleID.var.recaled.bgl.phased \
     -beagleProbs:BEAGLE $sampleID.var.recaled.bgl.gprobs \
     -o $sampleID.var.recaled.bgl.vcf
    
  • blueskypyblueskypy Posts: 228Member

    could someone help with this error?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,973Administrator, GATK Developer admin

    Hi @blueskypy,

    We tracked down this error to a bug. We'll fix this asap but in the meantime you can try to use --unsafe LENIENT_VCF_PROCESSING to bypass the problem. I'm not sure it will work but it's worth a shot.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 228Member

    hi, Geraldine,
    Thanks so much for your help! Yes, the --unsafe option helps.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,973Administrator, GATK Developer admin

    FYI, we have implemented a fix for this issue which will be in the upcoming 2.6 release.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.