We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Speed of UnifiedGenotyper with ploidy > 25

lukasendlerlukasendler Member
edited February 2013 in Ask the GATK team

Hi,
First of all thanks for providing such a great set of tools and support to go with it. Your documentation and forum helps a lot.
I am doing a GWAS using pooled Drosophila samples with 50 to 100 individuals per sample, and tried calling SNPs using the UnifiedGenotyper (GATK 2.3.9). It runs fine using a low ploidy, taking about 4 minutes per megabase for a ploidy=10, and 15 for ploidy=25, but at a ploidy=100 it is going for days. It does not seem to be a memory issue, as GATK does barely use the 10 GB reserved for it. The coverage for my samples is quite low, ranging from 50x to 100x. So I was wondering whether I was doing something wrong, or whether the algorithm just gets slow with considering too many potential genotypes. In the end I am not interested in the most likely genotype-combination, but just the SNP positions, as I am doing additional testing for each SNP afterwards.
I was wondering how much the likelihoods for called SNPs would be influenced by having a lower ploidy than the samples actually should have, as the coverage is not that high.
Also is there a difference between using the parameters GENERALPLOIDYSNP and SNP for --genotype_likelihoods_model?
The commandline I am using is:
java -Xmx10g -jar GATK_2.3.9/GenomeAnalysisTK.jar --num_threads 3 -dt NONE -L 3L:1000000-2000000 -R combined_genomes.fa -T UnifiedGenotyper -I ../GATK_realign/9167.3L.1_10mb_sanger_RG_real.bam -I ../GATK_realign/9456.3L.1_10mb_RG_real.bam -I ../GATK_realign/9457.3L.1_10mb_RG_real.bam -o all_real.3L.1_10mb_sanger_RG_unif_real_region_pl10.vcf --output_mode EMIT_VARIANTS_ONLY --sample_ploidy 10 -glm GENERALPLOIDYSNP -pnrm EXACT_GENERAL_PLOIDY -maxAltAlleles 3 -stand_call_conf 80.0 -stand_emit_conf 50.0 -
nct 4

Thanks in advance and all the best,
Lukas

Post edited by lukasendler on

Best Answer

Answers

  • lukasendlerlukasendler Member

    Thanks for the answer and the tip with the alternative allele number.
    I compared the results with ploidy 10, 25 and 50, and only found small differences in most predicted SNPs, with the quality values of 90 percent of predictions varying less then 10 percent between p=25 and p=50, and the biggest part of the higher variation lying below a quality of hundred.
    As I can perfectly well live without the low frequency alleles, I will stick to p=25 ;)

Sign In or Register to comment.