VariantAnnotator 3.6-g89b7209 writes out a VCF file with BCF or other binary content

mmokrejsmmokrejs Czech RepublicMember
edited December 2016 in Ask the GATK team

Hi,
my shell script is looping over several input VCF files from SnpEff-4.1e and adjusts the annotation.

java -jar GenomeAnalysisTK.jar -T VariantAnnotator --num_threads 16 -R ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.fasta -A SnpEff -I $sample.bam --dbsnp ftp.broadinstitute.org/bundle/hg38/hg38bundle/dbsnp_144.hg38.Broad-style.vcf --variant $sample.samtools.bcftools.vcf --snpEffFile $sample.samtools.bcftools.SnpEff.vcf -o $sample.samtools.bcftools.SnpEff.VariantAnnotator.annotated.vcf

Below I show 3 VCF files created in the same shell loop using same VariantAnnotator-calling commandline. The third file is broken. Not only the header is broken but also the non-comment lines do not contain a TAB separated text as usually in a VCF file but a binary blob. I do not show the binary blob but the attached temporary file which same same issue like the final file.

$ head -n 3 $sample/realignedBAM/$sample*.VariantAnnotator.annotated.vcf
==> $sample/realignedBAM/$sample.gatk.HaplotypeCaller.SnpEff.VariantAnnotator.annotated.vcf <==

fileformat=VCFv4.2

FILTER=<ID=LowQual,Description="Low quality">

FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

==> $sample/realignedBAM/$sample.gatk.UnifiedGenotyper.SnpEff.VariantAnnotator.annotated.vcf <==

fileformat=VCFv4.2

FILTER=<ID=LowQual,Description="Low quality">

FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

==> $sample/realignedBAM/$sample.samtools.bcftools.SnpEff.VariantAnnotator.annotated.vcf <==
BCF��##fileformat=VCFv4.2

ALT=<ID=*,Description="Represents allele(s) other than observed.">

FILTER=<ID=PASS,Description="All filters passed">

$

Can you confirm the program sometimes switches itself into a BCF-output mode mistakenly?

I see these binary blobs not only in the final VCF file written out via the "java -jar GenomeAnalysisTK.jar -T VariantAnnotator -o $sample.vcf" switch but even in /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub5971417705075239476.tmp files.

Assuming GATK uses internally BCF format, maybe it sometimes mistakenly just appends the non-comment lines from /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub* files to $sample.vcf without doing the actual conversion back to VCF?

Best Answer

Answers

Sign In or Register to comment.