Running time of unifiedgenotyper

Hi, I'm using unifiedgenotyper for the SNP calling in one lane illumina RNA-seq data, the bam file is ~15gb. The command I used is:
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R assembly.fa -I seq.bam --out result.vcf -ploidy 48 -stand_call_conf 20 -stand_emit_conf 20.0
I run it with 2 cpus and 72gb memory. It has been run for 3 days (I also used -nt 12 to run it with 12 cpus at the same time but the speed didn't improve significantly) and haven't finished. In addition, there are only 6000 loci were write into the vcf file by now, but I analysed this data using other pipelines at the same time, all of them have been finished and reported ~1 million loci.
So can anybody tell me usually how long it will take for unifiedgenotyper to analysis one lane illumina RNA-seq data? Thanks.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Are you by any chance using a draft genome with many contigs as reference? That could explain why you're seeing such slow performance. It could also be the high ploidy -- I don't have experience with higher ploidies so I'm not sure what are the consequences for performance.

    One way to make it go faster would be to pass in a list of target intervals to restrict the analysis to regions of interest.

  • yeamanyeaman Member

    I have also found that setting the java flags like so:

    java -jar -Xmx4G -XX:MaxPermSize=4G -XX:PermSize=4G GenomeAnalysisTK.jar …

    can really improve the speed (these can be set to higher values, depending on your machine). I had one job that was reporting a 70 hour run time and when I set these flags it came back at under 2 hours.

Sign In or Register to comment.