HaplotypeCaller with very low coverage/inconsistent coverage bam file


I recently had to download some SOLiD datasets (whole genome, non-human) from a paper published in 2010 and perform SNP calling using those reads. I roughly followed the best-practices recommendations and used HaplotypeCaller to call variants after BQSR. The problem is that the number of variants called were far fewer than the number of variants called in the original 2010 paper, and this is before the VariantFiltration step. Although the original paper used a very relaxed method to detect variants, where they called a position polymorphic if 3 independent reads all had the same non-reference nucleotide at that position, they stated that over 95% of the SNPs they reported were true positives. So my question is, what thresholds can I lower so HaplotypeCaller can call more variants? I have already lowered both -stand_emit_conf and -stand_call_conf to 10, would you recommend going even lower? Thanks!


  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @angezou,

    We don't actively support solid data in that we never work with it. Again, I'm not familiar with SOLiD data so I can't help you with your question. You may be able to find some SOLiD related posts in our forum and also just study up on HaplotypeCaller features that SOLiD data may be sensitive to. Good luck.

Sign In or Register to comment.