If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Why is HaplotypeCaller dropping half of my reads?
I have been trying HaplotypeCaller to find SNPs and INDELS in viral read data (haploid) but am finding that it throws away around half of my reads and I don't understand why. A small proportion (8%) are filtered out duplicates and 0.05% fail on mapping quality but I can't account for the majority of lost reads. I appreciate that GATK wasn't built for viral sequences but would you have an idea of what could be causing this?
I use the following command after marking duplicates and realigning around indels:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R Ref.fasta -I realigned_reads.bam --genotyping_mode DISCOVERY -ploidy 1 -bamout reassembled.bam -o rawvariants.vcf
I have also tried the same file with UnifiedGenotype and I get the result I expect i.e. most of my reads are retained and I have SNP calls that agree with a VCF constructed in a different program so I assume the reads are lost as part of the local realignment?