It looks like you're new here. If you want to get involved, click one of these buttons!
gareth862
Posts: 4Member ✭
I am seeing this error on single human WGS sample -
The provided VCF file is malformed at approximately line number "x": there are 557 genotypes while the header requires that 1525 genotypes be present for all records
Interestingly, when I run VQSR as part of the same pipeline on the same sample consecutive times, the "x" changes to different line numbers each time. I was wondering if someone could explain the meaning of the error message more?
@gareth862, might I suggest you use (or try) the file with the same name from the 2.2 bundle, or if you want to stick to the 1.5 bundle use the 1000G_omni2.5.b37.sites.vcf. Those two are the same file, but don't have the individual genotypes in them. VQSR doesn't need the actual genotypes in this file and reading them in during VQSR for a whole genome sample will take longer (it'll also be smaller in size so it'll be faster to download, I believe @ebanks mentioned that in a previous thread/discussion in regards to the names of the file when they distributed the 2.2 resource bundle).
Answers
Hi Gareth, have you tried validating your VCF (with vcftools)? And can you tell me if your VCF was produced directly by GATK or if it was modified in any way by other tools?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •These VCFs were produced in combination with Picard AddorReplaceReadGroups and MarkDuplicates. While I try to get vcftools running, could you look at the warnings produced by GATK's ValidateVariants? Do you think this is a reason we are seeing the error?
WARN 16:38:37,627 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr5:176515816, 1 vs. 1 ***** WARN 16:38:37,642 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr5:177378571, 1 vs. 1 ***** WARN 16:38:37,723 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr5:179853352, 1 vs. 1 ***** WARN 16:38:37,772 ValidateVariants - ***** the Allele Count (AC) tag is incorrect for the record at position chr6:910087, 1 vs. 1 *****
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Might be a symptom, but not the cause... Can you tell me what version you're using and what are the successive command lines that were used in the pipeline?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •could you by chance be using this file "1000G_omni2.5.b37.vcf" in the 1.5/b37 GATK/resource bundle and running VariantRecalibrator? that file does contain 1525 samples...which kind of says to me that maybe your copy of this file is corrupted (thus why it is saying that it requires 1525 genotypes, but only finds 557 genotypes).
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Kurt, I think you might be right about this, although I am using the 1000G_omni2.5.hg19.vcf. I'm going to try re-downloading and running again.
As for the command lines, here they are:
java -Xmx12g -jar ${TOOLS}GATK/GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T RealignerTargetCreator -I ${BAM} -o ${BAM}.intervals; java -Xmx12g -jar ${TOOLS}GATK/GenomeAnalysisTK.jar -I ${BAM} -R ucsc.hg19.fasta -T IndelRealigner -targetIntervals ${BAM}.intervals -o ${BAM}.realigned.bam
java -Xmx12g -jar ${TOOLS}GATK/GenomeAnalysisTKLite.jar -T UnifiedGenotyper -nt 30 -I $1 -o $1.SNP.vcf -R /home/Pegasus5/HG01140/ucsc.hg19.fasta -glm SNP -metrics $1.SNP.metrics
java -Xmx12g -jar ${TOOLS}GATK/GenomeAnalysisTKLite.jar -T VariantRecalibrator -input $1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf -R /home/Pegasus5/HG01140/ucsc.hg19.fasta -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ${HAPMAP} -resource:omni,known=false,training=true,truth=false,prior=12.0 ${KGP} -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 ${DBSNP} -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -recalFile $1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf.recal -rscriptFile $1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf.plots.R -tranchesFile $1.novoalign.merged.sorted.rg.dedup.bam.bam.realigned.bam.SNP.vcf.tranches -nt 8
Where
HAPMAP="/home/Pegasus5/HG01140/hapmap_3.3.hg19.vcf"; DBSNP="/home/Pegasus5/HG01140/dbsnp_135.hg19.vcf.gz"; KGP="/home/Pegasus5/HG01140/1000G_omni2.5.hg19.vcf";
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •To follow up here : a re-downloaded 1000G_omni2.5.hg19.sites.vcf did the trick! I guess the old one we had was either partial or out of date? Thanks again for all the help.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks for confirming the problem was solved.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •