We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

CountCovariates timings

TWD_TWDTWD_TWD Member
edited September 2012 in Ask the GATK team

Hello GATK team,

I am running CountCovarites using -nt 16 and for a 30x whole genome BAM, it is taking ~12 hrs to finish this step.
Is this a normal time? Are there ways to further expedite this? I have very good machine with 48GB RAM with 2.7 GHz and nothing else is running on it other than GATK. I'd appreciate suggestions for faster analyses. Command:

java -Xmx44g -jar /data/wtembe/iSAAC_testing/GATK_local/GenomeAnalysisTK-1.6-13-g91f02df/GenomeAnalysisTK.jar \
-T CountCovariates \
-I ${BAM_FILE} \
-recalFile ${BAM_FILE}.recal.csv \
-knownSites ${KNOWN_SNPS} \
-R ${REF_FASTA} \
-nt ${PROCS}

Best,

TWD

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Yes, these runtimes look reasonable. At some point (likely well before you hit 16 threads) the parallelization bottoms out and you don't gain anything by adding more threads. The better solution to achieve more parallelization is to scatter-gather your data (as described elsewhere on this forum).

Sign In or Register to comment.