GATK-Snp calling

JayceJayce Posts: 9Member

hi,

can anyone help to advice if splitting the bam files Versus splitting the -L region, which one could speed up faster? asssume that i have 500 bam files, will splitting the bam files to 22 different chromosomes will increase the speed further as compare to splitting the -L region (intervals)? Can anyone help to advice or probably had experienced it before. thank you.

Best Answer

Answers

  • JayceJayce Posts: 9Member

    hi, i was referring to Unified Genotyper (Snp Calling) in GATK.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,171Administrator, GATK Dev admin

    Have a look at these documents, which cover the available parallelism modes:

    http://www.broadinstitute.org/gatk/guide/tagged?tag=parallelism&tab=docs

    Note that the -L argument is meant to specify target intervals if you are working with exome data.

    Geraldine Van der Auwera, PhD

  • JayceJayce Posts: 9Member

    Thanks Geraldine for your answer,i understand that parellism(multi-threading) -nt do help. but i just want to confirm if the speed is equivalent? (with same -nt settings) whether splitting the -L region is equivalent to splitting the bam file?i really need this answer on top of -nt multi-threading to speed up my very huge sample sizes. i just dont want to spend time in splitting the bam file if it is equivalant (speed wise) to splitting the -L interval regions. thank you for help and advice:)

  • JayceJayce Posts: 9Member
    edited March 2013

    thank you for your clear advice. thank you very much :)

    Post edited by Jayce on
Sign In or Register to comment.