Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

Bug in creation outputvcf file after Beagle

TiphaineTiphaine Posts: 53Member
edited October 2012 in Ask the GATK team

Hi,

I used Beagle to phase my data but for some indels, I have some probleme :

example :

Input vcf :

2       68599872        .       ATG     A       14.40   PASS    AC=1;AC1=1;AF=0.028

Input for beagle created by ProduceBeagleInput:

2:68599872 TG - 1.0000 0.0000 0.0000 ......

Output vcf created by BeagleOutputToVCF:

2       68599872        .       ATG     .       14.40   BGL_RM_WAS_-    AC1=1;AF1=0.02965.....

error message by CombineVariants:

MESSAGE: Badly formed variant context at location 68599872 in contig 2. Reference length must be at most one base shorter than location size

Can you help me?

Tipahine

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,418Administrator, GATK Developer admin

    Have you validated the vcf file output by Beagle? If it fails you may need to contact the authors of Beagle -- if their tool is producing bad vcf files, we can't help with that. But if the problem is on our end we'll do what we can.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Posts: 683GATK Developer mod

    Actually, this looks like it may be a bug in our code. We'll take a quick look and get back to you with some feedback.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,418Administrator, GATK Developer admin

    Yeah, a shot of caffeine later I realized our BeagleOutputToVCF may be the culprit, since that's what's making the VCF. Sorry about that.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Posts: 683GATK Developer mod

    Okay, it looks like Beagle claimed that your site was monomorphic so BeagleOutputToVCF is filtering your site and setting the ALT allele to "." (not polymorphic). This looks reasonable. So the problem you are getting must be in CombineVariants. Are you using the latest version of the GATK? If so, what is your command-line? (And if not, please update to the latest version)

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • TiphaineTiphaine Posts: 53Member

    Sorry, it is not the lastest version but v2.0-35-g2d70733 my command line is : java -jar $GATK_HOME/GenomeAnalysisTK.jar -R $RefGen -T BeagleOutputToVCF -V $VcfFile -beagleR2:BEAGLE $r2 -beaglePhased:BEAGLE $phase -beagleProbs:BEAGLE $probs -o $vcfBeagle -U LENIENT_VCF_PROCESSING -et NO_ET -K $GATK_KEY

  • ebanksebanks Posts: 683GATK Developer mod

    Sorry, it's the Combine Variants command-line that we need.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • TiphaineTiphaine Posts: 53Member
    edited November 2012

    I use it in PBS script so before that, I define value for each variable

     java -jar $GATK_HOME/GenomeAnalysisTK.jar -R $RefGen -T CombineVariants -U LENIENT_VCF_PROCESSIN
     G --out $outputFile -V:input1 $input1 -V:input2 $input2 -V:input3 $input3 -V:input4 $input4 -V:input5 $input5 -V:input
     6 $input6 -V:input7 $input7 -V:input8 $input8 -V:input9 $input9 -V:input10 $input10 -V:input11 $input11 -V:input12 $in
     put12 -V:input13 $input13 -V:input14 $input14 -V:input15 $input15 -V:input16 $input16 -V:input17 $input17 -V:input18 $
     input18 -V:input19 $input19 -V:input20 $input20 -V:input21 $input21 -V:input22 $input22 -V:inputX $inputX -genotypeMer
     geOptions PRIORITIZE -priority input1,input2,input3,input4,input5,input6,input7,input8,input9,input10,input11,input12,
     input13,input14,input15,input16,input17,input18,input19,input20,input21,input22,inputX -et NO_ET -K $GATK_KEY
    
    Post edited by Geraldine_VdAuwera on
  • ebanksebanks Posts: 683GATK Developer mod

    I would try it with the latest version of the GATK. If it still fails, then I recommend trying to find a subset of the data with which you can replicate this error (i.e. just 2 of your input VCF files to CombineVariants) and then post the records at 68599872 here so we can help you figure out where the problem is.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Sign In or Register to comment.