Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using UG with a pol of 144 haploids

Dear GATK team,

I am trying to run the UG for 2 samples, each is a pool of 144 yeast strains (~200M reads for the 2 samples).
The run gets stuck after ~40,000 bases. My guess that using ploidy of 144 is too heavy, and cannot proceed.
I thought to split the run using the -L option to (lots of) multiple files along the chromosome, and run UG for each separately and merge. If this will work, it will also take a very long time to run. Any suggestions how to overcome this?

Thanks

Answers

  • delangeldelangel Broad InstituteMember ✭✭

    You approach will work if you have a cluster with many nodes - unfortunately there are no easy solutions, as calling pools with such a high ploidy is computationally very expensive. A partial workaround for this is to set -maxAltAlleles 1, since complexity goes as (ploidy)^N_alleles. You won't get multiallelic calls but it will certainly be faster.

Sign In or Register to comment.