We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GATK4 MergeVcfs "One or more header lines must be in the header line collection"
Hi! I am trying to use
MergeVcfs to merge several VCF files (VarScan2 output files) but I am getting the following error:
gatk MergeVcfs \ -I A.vcf \ -I B.vcf \ -D human_g1k_v37_decoy.dict -O out.vcf ... java.lang.IllegalArgumentException: One or more header lines must be in the header line collection ...
Unfortunately I cannot find any information about this error message. I have tried using
gatk ValidateVariants to validate the input VCF files but this does not return any errors:
gatk ValidateVariants \ -V A.vcf \ -R human_g1k_v37_decoy.fasta ... 12:01:11.764 INFO ValidateVariants - Done initializing engine 12:01:11.764 INFO ProgressMeter - Starting traversal 12:01:11.765 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 12:01:12.641 INFO ProgressMeter - 1:29562369 0.0 43393 2978924.5 12:01:12.642 INFO ProgressMeter - Traversal complete. Processed 43393 total variants in 0.0 minutes. 12:01:12.642 INFO ValidateVariants - Shutting down engine [July 1, 2018 12:01:12 PM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.03 minutes.
Can anyone familiar with the code point me in the right direction?
The VCF header for
B.vcf looks as follows:
##fileformat=VCFv4.1 ##source=VarScan2 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases"> ##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation"> ##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown) ##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value"> ##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls ##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls"> ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand"> ##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)"> ##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)"> ##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency"> ##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR