Recommendations for running UnifiedGenotyper on many samples

Do you have any recommendations on the settings to run many samples on UnifiedGenotyper? I'm doing 1mbp windows at this point in time with 4g memory on 3500 bam files that are about 4x depth on average.

At this point in time, jobs are estimated to last from 4 days to 5 weeks per 1mbp interval.

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
  • atksatks Member

    Yes. I'm using a scatter-gather approach for parallelization, using multithreading will probably added an overhead that I can go around by splitting my jobs. Does Broad has any recommendations for the memory setting for the JVM for 3500 individuals? Do you have some sort of expectations for the timing to call 3500 individuals in a 1mbp window? Is there a guideline on how running time scales with sample size? I can run jobs with 300 samples easily and efficiently on the UnifiedGenotyper but increasing the samples tenfold yield time increments that are not in linear proportion, for example it took 3hrs to process 300 samples but it can take up to an estimate 4 weeks on 3500 samples. Is this expected?

  • atksatks Member

    ok, will give that a shot too. thanks!

    btw, is there anyway to change that frowny face on my gatk profile?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sure, just go to your profile (click on your name in the top left corner), select Edit profile in the drop-down menu on the right (icon is a person silhouette), and there'll be an option in the left-hand menu to change your profile picture.

Sign In or Register to comment.