Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What to do with really high coverage samples for variant calling?

I have data from amplicon sequencing that is extremely high depth (average of 148K reads at my variants of interest). In running haplotype caller with a single sample and groups of samples, I noticed that I am getting drastically different genotyping calls depending on the run. The outputted allele depth seems to be maxing out for most of samples between 100-500, I figured that is randomly sampling down to 250 reads and each time it producing different results because it is only sampling ~0.3% of my reads. After looking at the different forums and posts on this site I see now that apparently the dcov option doesn't work with haplotype caller and has been disabled. I've run my samples through unified genotyper and I could disable the downsampling and I am seeing the expected results. However, haplotype caller is more sensitive than unified genotyper so it is my understanding that haplotype caller should always be used. I also want to use haplotype caller because one of my variants is a small deletion. I am hoping you can advise me on how I might move forward. Should I just use unified genotyper or is there some change I can make in the haplotype caller to make it work? This is a very small region so I am not concerned about the increased computational time of it using all the reads to make a call.

Best Answers

  • shawpashawpa
    Accepted Answer

    I found an example where the genotype calls are not matching. The sample is G1687 and the relevant SNP is rs2518873. The other post about haplotype caller not calling variants for amplicon sequencing was quite helpful. It seems others are having similar issues. I also am finding it very odd that haplotype caller won't even consistently call the same SNPs. I realize 0/0's are not outputted by default unless it is multisample but it is just missing obviously heterozygous sites. The bamout file has no reads listed in that region. Seems depending on the mix of samples I use as input, the outputted SNPs are different. UnifiedGenotyper picks them all up.


Sign In or Register to comment.