This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Speed of UnifiedGenotyper with ploidy > 25
First of all thanks for providing such a great set of tools and support to go with it. Your documentation and forum helps a lot.
I am doing a GWAS using pooled Drosophila samples with 50 to 100 individuals per sample, and tried calling SNPs using the UnifiedGenotyper (GATK 2.3.9). It runs fine using a low ploidy, taking about 4 minutes per megabase for a ploidy=10, and 15 for ploidy=25, but at a ploidy=100 it is going for days. It does not seem to be a memory issue, as GATK does barely use the 10 GB reserved for it. The coverage for my samples is quite low, ranging from 50x to 100x. So I was wondering whether I was doing something wrong, or whether the algorithm just gets slow with considering too many potential genotypes. In the end I am not interested in the most likely genotype-combination, but just the SNP positions, as I am doing additional testing for each SNP afterwards.
I was wondering how much the likelihoods for called SNPs would be influenced by having a lower ploidy than the samples actually should have, as the coverage is not that high.
Also is there a difference between using the parameters GENERALPLOIDYSNP and SNP for --genotype_likelihoods_model?
The commandline I am using is:
java -Xmx10g -jar GATK_2.3.9/GenomeAnalysisTK.jar --num_threads 3 -dt NONE -L 3L:1000000-2000000 -R combined_genomes.fa -T UnifiedGenotyper -I ../GATK_realign/9167.3L.1_10mb_sanger_RG_real.bam -I ../GATK_realign/9456.3L.1_10mb_RG_real.bam -I ../GATK_realign/9457.3L.1_10mb_RG_real.bam -o all_real.3L.1_10mb_sanger_RG_unif_real_region_pl10.vcf --output_mode EMIT_VARIANTS_ONLY --sample_ploidy 10 -glm GENERALPLOIDYSNP -pnrm EXACT_GENERAL_PLOIDY -maxAltAlleles 3 -stand_call_conf 80.0 -stand_emit_conf 50.0 -
Thanks in advance and all the best,