We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Using UG with a pol of 144 haploids

Dear GATK team,

I am trying to run the UG for 2 samples, each is a pool of 144 yeast strains (~200M reads for the 2 samples).
The run gets stuck after ~40,000 bases. My guess that using ploidy of 144 is too heavy, and cannot proceed.
I thought to split the run using the -L option to (lots of) multiple files along the chromosome, and run UG for each separately and merge. If this will work, it will also take a very long time to run. Any suggestions how to overcome this?

Thanks

Answers

  • delangeldelangel Broad InstituteMember ✭✭

    You approach will work if you have a cluster with many nodes - unfortunately there are no easy solutions, as calling pools with such a high ploidy is computationally very expensive. A partial workaround for this is to set -maxAltAlleles 1, since complexity goes as (ploidy)^N_alleles. You won't get multiallelic calls but it will certainly be faster.

Sign In or Register to comment.