The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Mills reference for indel VQSR

Posts: 5

Hi all --

This should be a simple problem -- I cannot find a valid version of the Mills indel reference in the resource bundle, or anywhere else online!

All versions of the reference VCF are stripped of genotypes and do not contain a FORMAT column or any additional annotations.

I am accessing the Broad's public FTP, and none of the Mills VCF files in bundle folders 2.5 or 2.8 contain a full VCF. I understand that there are "sites only" VCF, but I can't seem to find anything else.

Can anyone link me to a version that contains the recommended annotations for indel VQSR, or that can be annotated?

Tagged:

edited March 2014

Hi @vasya,

You don't need to have the annotations in the reference VCF for VQSR, you only have to have them in your test VCF. Are you experiencing issues running the tool?

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

• Posts: 5

Thanks for the quick reply Geraldine! I misunderstood what data were being used to train the model.

And yes, I am having trouble with the VQSR of indels. GATK throws the following error:

##### ERROR MESSAGE: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data

The Mills reference is missing a FORMAT column.

The relevant parts of the command line (GATK v2.8):

--use_annotation "DP"
--use_annotation "MQRankSum"
--mode "INDEL"
--input:input_0,vcf "/ephemeral/0/condor/dir_20949/tmp-gatk-_t7uu2/input_variants_0.vcf"

Here the file "input_Unknown_0.vcf" is pulled directly from the broad's FTP (/bundle/2.8/hg19/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz).

We have successfully run the VQSR on SNPs contained in my input VCF using the recommended reference files, and nearly identical annotations.

Hm. Can you try deleting the FORMAT definition line in the header? That might do the trick. The file shouldn't need a FORMAT column as far as I can remember. Not sure how we ended up emitting a file with a malformed header, will check that.

Geraldine Van der Auwera, PhD

• Posts: 5

I have tried a couple of modified versions of the reference:

• Removing the FORMAT fields from the header.
• Removing all but at GT FORMAT field from the header, and adding a GT FORMAT column.

Unfortunately, both of these produced the same error. Is this the same file on the internal FTP server? Can I get a copy that has been successfully used previously?

On an unrelated note -- congrats on 5,000 posts! Its a real testament to the support that you provide!

• Posts: 5

Thanks for your help Geraldine. The recalibration worked perfectly with the file that you specified.