The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# VQSR error: "The provided VCF file is malformed at... "

Posts: 4

I am seeing this error on single human WGS sample -

The provided VCF file is malformed at approximately line number "x": there are 557 genotypes while the header requires that 1525 genotypes be present for all records

Interestingly, when I run VQSR as part of the same pipeline on the same sample consecutive times, the "x" changes to different line numbers each time. I was wondering if someone could explain the meaning of the error message more?

Tagged:

Hi Gareth, have you tried validating your VCF (with vcftools)? And can you tell me if your VCF was produced directly by GATK or if it was modified in any way by other tools?

Geraldine Van der Auwera, PhD

• Posts: 4

These VCFs were produced in combination with Picard AddorReplaceReadGroups and MarkDuplicates. While I try to get vcftools running, could you look at the warnings produced by GATK's ValidateVariants? Do you think this is a reason we are seeing the error?

WARN 16:38:37,627 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr5:176515816, 1 vs. 1 *****
WARN 16:38:37,642 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr5:177378571, 1 vs. 1 *****
WARN 16:38:37,723 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr5:179853352, 1 vs. 1 *****
WARN 16:38:37,772 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr6:910087, 1 vs. 1 *****

Might be a symptom, but not the cause... Can you tell me what version you're using and what are the successive command lines that were used in the pipeline?

Geraldine Van der Auwera, PhD

• Posts: 255 ✭✭✭

could you by chance be using this file "1000G_omni2.5.b37.vcf" in the 1.5/b37 GATK/resource bundle and running VariantRecalibrator? that file does contain 1525 samples...which kind of says to me that maybe your copy of this file is corrupted (thus why it is saying that it requires 1525 genotypes, but only finds 557 genotypes).

• Posts: 4

As for the command lines, here they are:

java -Xmx12g -jar {TOOLS}GATK/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T RealignerTargetCreator -I{BAM} -o ${BAM}.intervals; java -Xmx12g -jar${TOOLS}GATK/GenomeAnalysisTK.jar -I {BAM} -R ucsc.hg19.fasta -T IndelRealigner -targetIntervals{BAM}.intervals -o {BAM}.realigned.bam java -Xmx12g -jar{TOOLS}GATK/GenomeAnalysisTKLite.jar -T UnifiedGenotyper -nt 30 -I $1 -o$1.SNP.vcf -R /home/Pegasus5/HG01140/ucsc.hg19.fasta -glm SNP -metrics $1.SNP.metrics java -Xmx12g -jar${TOOLS}GATK/GenomeAnalysisTKLite.jar -T VariantRecalibrator -input 1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf -R /home/Pegasus5/HG01140/ucsc.hg19.fasta -resource:hapmap,known=false,training=true,truth=true,prior=15.0{HAPMAP} -resource:omni,known=false,training=true,truth=false,prior=12.0 ${KGP} -resource:dbsnp,known=true,training=false,truth=false,prior=8.0${DBSNP} -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile 1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf.recal -rscriptFile1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf.plots.R -tranchesFile \$1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf.tranches -nt 8

Where

HAPMAP="/home/Pegasus5/HG01140/hapmap_3.3.hg19.vcf";
DBSNP="/home/Pegasus5/HG01140/dbsnp_135.hg19.vcf.gz";
KGP="/home/Pegasus5/HG01140/1000G_omni2.5.hg19.vcf";

• Posts: 4

To follow up here : a re-downloaded 1000G_omni2.5.hg19.sites.vcf did the trick! I guess the old one we had was either partial or out of date? Thanks again for all the help.