This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
ERROR MESSAGE: The provided VCF file is malformed
I have whole genome data for around 500 individuals for which I am running the GATK variant calling pipeline. I have run the exact same pipeline before on a much smaller data set and didn't experience any problems. Thus I am not sure what causes the problem described below and I hope you can provide me with some help on this issue.
Here is a short description of what I have done:
After all the data pre-processing steps, I have run GATK's HaplotypeCaller on each sample's bam-file using the following command:
"java -Xmx4g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I sample.bam --genotyping_mode DISCOVERY
--emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -o sample.gvcf"
As suggested in the Best Practice Guide, I combined multiple gvcf-files using the following command:
"java -Xmx50g -jar GenomeAnalysisTK.jar -T CombineGVCFs -R reference.fasta --variant sample1.gvcf --variant sample2.gvcf (...) -o combined.gvcf".
This step runs through without any problem for all my samples but when I am trying to genotype them using
"java -Xmx100g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta --variant combined1.gvcf --variant combined2.gvcf (...)
GATK throws the following error message: "ERROR MESSAGE: The provided VCF file is malformed at approximately line number 86995422: ./.:0:0:0:0,0,0 is not a valid start position in the VCF format"
Thank you in advance for your help!