The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Mutect2 with contamination estimates
How many sites does ContEst need to get an accurate answer?
A couple of my samples give me results like this:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
META CEU n/a 57.3 0.8 56.9 57.7 83
57% contamination seems very high. Other samples report using around 1000 sites and the contamination comes out around 20%. I wonder if the high result is inaccurate as ConTest is only using 83 sites?
How does mutect2 use the output from ContEst?. I would to like to run Mutect2 with and without the ConTest results, as I am concerned I will get very few SNPs passing if such a high level of contamination is assumed . However Mutect2 is running very slowly and I don't have the compute resources to run it twice. Is there any way I can filter the output of muctect2 to take into account the contamination estimates?
Any thoughts much appreciated