To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GATK-Snp calling

hi,

can anyone help to advice if splitting the bam files Versus splitting the -L region, which one could speed up faster? asssume that i have 500 bam files, will splitting the bam files to 22 different chromosomes will increase the speed further as compare to splitting the -L region (intervals)? Can anyone help to advice or probably had experienced it before. thank you.

Best Answer

Answers

  • JayceJayce Member

    hi, i was referring to Unified Genotyper (Snp Calling) in GATK.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Have a look at these documents, which cover the available parallelism modes:

    http://www.broadinstitute.org/gatk/guide/tagged?tag=parallelism&tab=docs

    Note that the -L argument is meant to specify target intervals if you are working with exome data.

  • JayceJayce Member

    Thanks Geraldine for your answer,i understand that parellism(multi-threading) -nt do help. but i just want to confirm if the speed is equivalent? (with same -nt settings) whether splitting the -L region is equivalent to splitting the bam file?i really need this answer on top of -nt multi-threading to speed up my very huge sample sizes. i just dont want to spend time in splitting the bam file if it is equivalant (speed wise) to splitting the -L interval regions. thank you for help and advice:)

  • JayceJayce Member
    edited March 2013

    thank you for your clear advice. thank you very much :)

Sign In or Register to comment.