If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Bamout file shows a consistent deletion that is not reported in VCF
Dear GATK team
I ran HaplotypeCaller on a bam file, which is the alignment of a single bacteria sample to its reference genome. To understand the calling process I wanted to compare the resulting VCF and bamout files, so I selected a small region.
java -jar -Xmx8g $BASE/GenomeAnalysisTK.jar -T HaplotypeCaller -R AE014075.fasta -I dedup_3_S3.bam -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -o S3_part.vcf -bamout newbamout389088.bam -L AE014075.1:389000-389190
Then I opened three tracks in IGV: the VCF file (S3_part.vcf) on top, then the original bam file (dedup_3_S3.bam), and lastly the bamout file (newbamout389088.bam). As you can see, the original bam file thinks there is a deletion at position 389087 and a SNP at 389091, supported by 861 out of ~1000 reads (MAPQ > 30). The bamfile, after re-alignment and re-assembly, thinks there is an insertion at 389088 and a 3-nt deletion at 389090, supported by almost 100% of the reads.
It is important to note that in the bamout track, all artificial haplotypes (pink reads) show the same insertion and deletions. Actually if you scroll down you will see that all reads from the original bam file (blue) contain exactly the same variants too, although not all of them are informative.
Now if you go to the top VCF track, the insertion at 389088 is called while the deletion at 389090 is not. This is confusing because, didn't the bamout file just say all assembled haploptypes contain a deletion at 389090? And almost all the re-aligned reads from the original bam file also contain this variant?
Also maybe a silly question: since the genome's ploidy is 1 and there's only 1 sample here, shouldn't there be only 1 haplotype by definition? Why bamout has ~10 haplotypes (when by default bamWriterType is CALLED_HAPLOTYPES)?
Thanks a lot!