The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Mills reference for indel VQSR

vasyavasya Member Posts: 5

Hi all --

This should be a simple problem -- I cannot find a valid version of the Mills indel reference in the resource bundle, or anywhere else online!

All versions of the reference VCF are stripped of genotypes and do not contain a FORMAT column or any additional annotations.

I am accessing the Broad's public FTP, and none of the Mills VCF files in bundle folders 2.5 or 2.8 contain a full VCF. I understand that there are "sites only" VCF, but I can't seem to find anything else.

Can anyone link me to a version that contains the recommended annotations for indel VQSR, or that can be annotated?


Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,421 admin
    edited March 2014

    Hi @vasya,

    You don't need to have the annotations in the reference VCF for VQSR, you only have to have them in your test VCF. Are you experiencing issues running the tool?

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • vasyavasya Member Posts: 5

    Thanks for the quick reply Geraldine! I misunderstood what data were being used to train the model.

    And yes, I am having trouble with the VQSR of indels. GATK throws the following error:

    ERROR MESSAGE: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data

    The Mills reference is missing a FORMAT column.

    The relevant parts of the command line (GATK v2.8):

    --use_annotation "DP"
    --use_annotation "MQRankSum"
    --use_annotation "ReadPosRankSum"
    --mode "INDEL"
    --input:input_0,vcf "/ephemeral/0/condor/dir_20949/tmp-gatk-_t7uu2/input_variants_0.vcf"
    --resource:Unknown,vcf,known=true,training=true,truth=true,bad=false,prior=12.0 "/ephemeral/0/condor/dir_20949/tmp-gatk-_t7uu2/input_Unknown_0.vcf"

    Here the file "input_Unknown_0.vcf" is pulled directly from the broad's FTP (/bundle/2.8/hg19/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz).

    We have successfully run the VQSR on SNPs contained in my input VCF using the recommended reference files, and nearly identical annotations.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,421 admin

    Hm. Can you try deleting the FORMAT definition line in the header? That might do the trick. The file shouldn't need a FORMAT column as far as I can remember. Not sure how we ended up emitting a file with a malformed header, will check that.

    Geraldine Van der Auwera, PhD

  • vasyavasya Member Posts: 5

    I have tried a couple of modified versions of the reference:

    • Removing the FORMAT fields from the header.
    • Removing all but at GT FORMAT field from the header, and adding a GT FORMAT column.

    Unfortunately, both of these produced the same error. Is this the same file on the internal FTP server? Can I get a copy that has been successfully used previously?

    On an unrelated note -- congrats on 5,000 posts! Its a real testament to the support that you provide!

  • vasyavasya Member Posts: 5

    Thanks for your help Geraldine. The recalibration worked perfectly with the file that you specified.

Sign In or Register to comment.