We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Error in running variant recalibration

tanu06tanu06 CanadaMember
edited March 2018 in Ask the GATK team

I am using GATK variant re calibration , it works fine on SNPs but throws an error on indel file. The error and my sample file are as follows:
ERROR MESSAGE: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data

Input file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Q11     Q11D1   Q11D2   Q11D4   Q11D5
1.1     1773    .       TTTTGAAATATTTAGATAA     T       407.08  .       AC=1;AF=0.167;AN=6;BaseQRankSum=-1.076e+00;ClippingRankSum=0.00;DP=104;ExcessHet=3.0103;FS=12.041;MLEAC=1;MLEAF=0.167;MQ=56.82;MQRankSum=1.75;QD=25.44;ReadPosRankSum=1.21;SOR=1.402    GT:AD:DP:GQ:PL  0/0:55,0:55:99:0,108,1620       0:11,0:11:99:0,253      0:13,0:13:99:0,357      0:9,0:9:99:0,204        1:3,13:16:99:450,0
1.1     1792    .       CTTTAAAAGAAAATACTGGACAATTTTTTGATTTGAATTGGTTTTGAAATATGAATATATTGTATAATATGAGATTAAGGTAAATTATTGAAATTCAATATATATGACATTCTTATTCTTTTTTCTGGGTTTTTTGATGATT  C       407.08  .       AC=1;AF=0.167;AN=6;BaseQRankSum=-6.730e-01;ClippingRankSum=1.35;DP=99;ExcessHet=3.0103;FS=12.041;MLEAC=1;MLEAF=0.167;MQ=56.82;MQRankSum=1.35;QD=25.44;ReadPosRankSum=2.56;SOR=1.402     GT:AD:DP:GQ:PL  0/0:50,0:50:55:0,55,1227        0:11,0:11:99:0,253      0:13,0:13:99:0,357      0:9,0:9:99:0,204        1:3,13:16:99:450,0

Please suggest.


Post edited by Geraldine_VdAuwera on


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @tanu06 Did you do any processing on the file that might have messed with the spacing between fields, eg replaced tabs with spaces? That could cause parsing issues.

  • dovabdovab Member

    Hi @Geraldine_VdAuwera, I work with @tanu06 and we did try to change the spacing and we continue to get the error message.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I see — can you clarify how the file was produced? Please list all operations that were involved no matter how minor.

  • dovabdovab Member
    edited March 2018

    java -jar /usr/local/src/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta --variant A_raw_variants.g.vcf.gz --variant B_raw_variants.g.vcf.gz --variant C_raw_variants.g.vcf.gz --variant D_raw_variants.g.vcf.gz --variant E_raw_variants.g.vcf.gz -o A_E.g.vcf

    grep -v INDEL A_E.g.vcf > A_E_indels.vcf

    grep -v ‘#’ A_E_indels.vcf | sort | less --chop-long-lines

    java -jar /usr/local/src/GenomeAnalysisTK.jar -T SelectVariants -R training.fasta -V A_E.g.vcf -selectType INDEL -o A_E_indels.vcf

    java -Xmx4g -jar /usr/local/src/GenomeAnalysisTK.jar -T VariantRecalibrator -R reference.fasta -input A_E_indels.vcf -recalFile A_E_indel.recal -tranchesFile A_E_indel.tranches -resource:Drone,known=true,training=true,truth=true,prior=12.0 INDEL_TRAINING.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum -mode INDEL -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 --maxGaussians 4 -nt 4

    This last step is the one that is failing. I originally had the same error with SelectVariants but grep helped with that. the same thing doesn't help with VariantRecalibrator, though. Again, when I followed the exact same procedure for SNPs it worked.

    Thank you,

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Dova,

    What happens if you run VariantRecalibrator with --mode SNP and --mode INDEL separately on the output of GenotypeGVCFs without doing any grepping or SelectVariants?


Sign In or Register to comment.