GATK on RAD Data - extraordinarily long run times
I have Type 2B RAD data from many individuals from several populations of my non-model species, mapped using Bowtie 0.12.8 to a reference database made by extracting all potential RAD sites from the available genome. I would like to run a first-pass UnifiedGenotyper run on a single individual, but even on a supercomputer and using the -nct and -nt flags, GATK says it will need 4.9 weeks to finish!
A collaborator suggested that GATK may just not handle many reference contigs well, but I have already reduced my reference database from the 1.6 million possible tags to the 95,000 tags that were seen at least 100x across all my individuals.
Does GATK respond to the number of contigs like this? Are there any tips you can give me to reduce the amount of time necessary to something more reasonable?