Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Error in running variant recalibration

tanu06tanu06 CanadaMember
edited March 2018 in Ask the GATK team

I am using GATK variant re calibration , it works fine on SNPs but throws an error on indel file. The error and my sample file are as follows:
ERROR MESSAGE: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data

Input file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Q11     Q11D1   Q11D2   Q11D4   Q11D5
1.1     1773    .       TTTTGAAATATTTAGATAA     T       407.08  .       AC=1;AF=0.167;AN=6;BaseQRankSum=-1.076e+00;ClippingRankSum=0.00;DP=104;ExcessHet=3.0103;FS=12.041;MLEAC=1;MLEAF=0.167;MQ=56.82;MQRankSum=1.75;QD=25.44;ReadPosRankSum=1.21;SOR=1.402    GT:AD:DP:GQ:PL  0/0:55,0:55:99:0,108,1620       0:11,0:11:99:0,253      0:13,0:13:99:0,357      0:9,0:9:99:0,204        1:3,13:16:99:450,0
1.1     1792    .       CTTTAAAAGAAAATACTGGACAATTTTTTGATTTGAATTGGTTTTGAAATATGAATATATTGTATAATATGAGATTAAGGTAAATTATTGAAATTCAATATATATGACATTCTTATTCTTTTTTCTGGGTTTTTTGATGATT  C       407.08  .       AC=1;AF=0.167;AN=6;BaseQRankSum=-6.730e-01;ClippingRankSum=1.35;DP=99;ExcessHet=3.0103;FS=12.041;MLEAC=1;MLEAF=0.167;MQ=56.82;MQRankSum=1.35;QD=25.44;ReadPosRankSum=2.56;SOR=1.402     GT:AD:DP:GQ:PL  0/0:50,0:50:55:0,55,1227        0:11,0:11:99:0,253      0:13,0:13:99:0,357      0:9,0:9:99:0,204        1:3,13:16:99:450,0

Please suggest.


Post edited by Geraldine_VdAuwera on


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @tanu06 Did you do any processing on the file that might have messed with the spacing between fields, eg replaced tabs with spaces? That could cause parsing issues.

  • dovabdovab Member

    Hi @Geraldine_VdAuwera, I work with @tanu06 and we did try to change the spacing and we continue to get the error message.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I see — can you clarify how the file was produced? Please list all operations that were involved no matter how minor.

  • dovabdovab Member
    edited March 2018

    java -jar /usr/local/src/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta --variant A_raw_variants.g.vcf.gz --variant B_raw_variants.g.vcf.gz --variant C_raw_variants.g.vcf.gz --variant D_raw_variants.g.vcf.gz --variant E_raw_variants.g.vcf.gz -o A_E.g.vcf

    grep -v INDEL A_E.g.vcf > A_E_indels.vcf

    grep -v ‘#’ A_E_indels.vcf | sort | less --chop-long-lines

    java -jar /usr/local/src/GenomeAnalysisTK.jar -T SelectVariants -R training.fasta -V A_E.g.vcf -selectType INDEL -o A_E_indels.vcf

    java -Xmx4g -jar /usr/local/src/GenomeAnalysisTK.jar -T VariantRecalibrator -R reference.fasta -input A_E_indels.vcf -recalFile A_E_indel.recal -tranchesFile A_E_indel.tranches -resource:Drone,known=true,training=true,truth=true,prior=12.0 INDEL_TRAINING.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum -mode INDEL -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 --maxGaussians 4 -nt 4

    This last step is the one that is failing. I originally had the same error with SelectVariants but grep helped with that. the same thing doesn't help with VariantRecalibrator, though. Again, when I followed the exact same procedure for SNPs it worked.

    Thank you,

  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi Dova,

    What happens if you run VariantRecalibrator with --mode SNP and --mode INDEL separately on the output of GenotypeGVCFs without doing any grepping or SelectVariants?


Sign In or Register to comment.