If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Mutect2 with contamination estimates
How many sites does ContEst need to get an accurate answer?
A couple of my samples give me results like this:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
META CEU n/a 57.3 0.8 56.9 57.7 83
57% contamination seems very high. Other samples report using around 1000 sites and the contamination comes out around 20%. I wonder if the high result is inaccurate as ConTest is only using 83 sites?
How does mutect2 use the output from ContEst?. I would to like to run Mutect2 with and without the ConTest results, as I am concerned I will get very few SNPs passing if such a high level of contamination is assumed . However Mutect2 is running very slowly and I don't have the compute resources to run it twice. Is there any way I can filter the output of muctect2 to take into account the contamination estimates?
Any thoughts much appreciated