Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantAnnotator 3.6-g89b7209 writes out a VCF file with BCF or other binary content

mmokrejsmmokrejs Czech RepublicMember
edited December 2016 in Ask the GATK team

Hi,
my shell script is looping over several input VCF files from SnpEff-4.1e and adjusts the annotation.

java -jar GenomeAnalysisTK.jar -T VariantAnnotator --num_threads 16 -R ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.fasta -A SnpEff -I $sample.bam --dbsnp ftp.broadinstitute.org/bundle/hg38/hg38bundle/dbsnp_144.hg38.Broad-style.vcf --variant $sample.samtools.bcftools.vcf --snpEffFile $sample.samtools.bcftools.SnpEff.vcf -o $sample.samtools.bcftools.SnpEff.VariantAnnotator.annotated.vcf

Below I show 3 VCF files created in the same shell loop using same VariantAnnotator-calling commandline. The third file is broken. Not only the header is broken but also the non-comment lines do not contain a TAB separated text as usually in a VCF file but a binary blob. I do not show the binary blob but the attached temporary file which same same issue like the final file.

$ head -n 3 $sample/realignedBAM/$sample*.VariantAnnotator.annotated.vcf
==> $sample/realignedBAM/$sample.gatk.HaplotypeCaller.SnpEff.VariantAnnotator.annotated.vcf <==

fileformat=VCFv4.2

FILTER=<ID=LowQual,Description="Low quality">

FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

==> $sample/realignedBAM/$sample.gatk.UnifiedGenotyper.SnpEff.VariantAnnotator.annotated.vcf <==

fileformat=VCFv4.2

FILTER=<ID=LowQual,Description="Low quality">

FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

==> $sample/realignedBAM/$sample.samtools.bcftools.SnpEff.VariantAnnotator.annotated.vcf <==
BCF��##fileformat=VCFv4.2

ALT=<ID=*,Description="Represents allele(s) other than observed.">

FILTER=<ID=PASS,Description="All filters passed">

$

Can you confirm the program sometimes switches itself into a BCF-output mode mistakenly?

I see these binary blobs not only in the final VCF file written out via the "java -jar GenomeAnalysisTK.jar -T VariantAnnotator -o $sample.vcf" switch but even in /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub5971417705075239476.tmp files.

Assuming GATK uses internally BCF format, maybe it sometimes mistakenly just appends the non-comment lines from /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub* files to $sample.vcf without doing the actual conversion back to VCF?

Best Answer

Answers

Sign In or Register to comment.