# Mills reference for indel VQSR

Hi all --

This should be a simple problem -- I cannot find a valid version of the Mills indel reference in the resource bundle, or anywhere else online!

All versions of the reference VCF are stripped of genotypes and do not contain a FORMAT column or any additional annotations.

I am accessing the Broad's public FTP, and none of the Mills VCF files in bundle folders 2.5 or 2.8 contain a full VCF. I understand that there are "sites only" VCF, but I can't seem to find anything else.

Can anyone link me to a version that contains the recommended annotations for indel VQSR, or that can be annotated?

Hi @vasya,

You don't need to have the annotations in the reference VCF for VQSR, you only have to have them in your test VCF. Are you experiencing issues running the tool?

Thanks for the quick reply Geraldine! I misunderstood what data were being used to train the model.

And yes, I am having trouble with the VQSR of indels. GATK throws the following error:

##### ERROR MESSAGE: Your input file has a malformed header: The FORMAT field was provided but there is no genotype/sample data

The Mills reference is missing a FORMAT column.

The relevant parts of the command line (GATK v2.8):

--use_annotation "DP"
--use_annotation "MQRankSum"
--mode "INDEL"
--input:input_0,vcf "/ephemeral/0/condor/dir_20949/tmp-gatk-_t7uu2/input_variants_0.vcf"

Here the file "input_Unknown_0.vcf" is pulled directly from the broad's FTP (/bundle/2.8/hg19/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz).

We have successfully run the VQSR on SNPs contained in my input VCF using the recommended reference files, and nearly identical annotations.

Hm. Can you try deleting the FORMAT definition line in the header? That might do the trick. The file shouldn't need a FORMAT column as far as I can remember. Not sure how we ended up emitting a file with a malformed header, will check that.

I have tried a couple of modified versions of the reference:

• Removing the FORMAT fields from the header.
• Removing all but at GT FORMAT field from the header, and adding a GT FORMAT column.

Unfortunately, both of these produced the same error. Is this the same file on the internal FTP server? Can I get a copy that has been successfully used previously?

On an unrelated note -- congrats on 5,000 posts! Its a real testament to the support that you provide!

Thanks for your help Geraldine. The recalibration worked perfectly with the file that you specified.