Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

CNNScoreVariants, too much threads

Hi,

in the BestPractice workflows you advise to use HaplotypeCaller with the "-XX:GCTimeLimit=50" and "-X:GCHeapFreeLimit=10" java options.

Is there something similar for CNNScoreVariants? I tried to use several java options with different values to limit threads but it is quite impossible. Without any option I have 116 threads, running only one command, with 5 java options I can limit them to 95 ... still too much! What should I limit here?

Many thanks

Best Answer

Answers

  • manolismanolis Member ✭✭
    edited February 25

    Just, in my case considering the set up of our server I cann't use Spark and WDL. In general, I have installed GATK v4.1.0.0, linux server, bash pipeline.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @manolis

    The 116 threads you mention, is it Java or Python threads? How are you concluding there are 116 threads and which tool are you using to determine this?

  • manolismanolis Member ✭✭

    Hi @bhanuGandham I have to check this, good question! I used 'htop' and I checked the number of threads before, during and after the job running alone. I will update you! Thanks

  • manolismanolis Member ✭✭

    Hi @bhanuGandham I checked it (without any Java option/limitation):

    Total threads: 116
        Java = 51
       Python = 65
    

    With some java options I can decrease the java threads from 51 to 31... Is there any similar option for python and in case of "yes" where can I add those options in yours CNNScoreVariants code or I have to act directly to the python level?

    Many thanks for the help!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited March 11

    HI @manolis

    For controlling the python threads, you can try using inter-op-thread advanced argument for CNNScoreVariants.
    Please let us know if this resolves the issue. Would be good information for other users too.

  • manolismanolis Member ✭✭
    edited March 25

    Hi, I tested the option "inter-op-thread" but in my case the number of threads has not changed.
    If I use a host with 30 cores the CNN tool generates around 116 total threads. If I use one with 62 cores it generates around 203 total threads.

    The "inter-op-thread" values that I tested are: 1, 2, 5, 10, 30, 60. I tried to test also the "--intra-op-threads" but nothing to do.

    Any other user has the same problem? ... hoping with a solution ...

    Sorry for my delay answer. Thanks @bhanuGandham

    Here is the code:

    source activate gatk4100
    
    /share/apps/bio/gatk-4.1.0.0/gatk CNNScoreVariants \
    --inter-op-threads ${a} \
    --intra-op-threads ${b} \
    -R ${hg38} \
    -I ${input.bam} \
    -V ${input.vcf} \
    -O ${output.cnn2.vcf} \
    -L ${interval} \
    --inference-batch-size 8 \
    --transfer-batch-size 32 \
    --tensor-type read_tensor \
    --tmp-dir ${tmp}
    
    conda deactivate
    
  • cnormancnorman United StatesMember, Broadie, Dev ✭✭
    edited March 29

    Hi @manolis. One other thing to experiment with would be trying some small values for the environment variable OMP_NUM_THREADS (i.e. export OMP_NUM_THREADS=1, or 2). I was able to reduce the number of threads used in the Python process this way on my local device. If that works, I would recommend caution when running other GATK tools, as it could impact other GATK components. Please let us know if that works, I'd be interested in knowing the results.

  • manolismanolis Member ✭✭

    Many thanks @cnorman

    I will try and I will let you know! Just I need some weeks to give you an answer, I do not have access at root level to the server and I have to ask to our support.

    Thanks

  • manolismanolis Member ✭✭

    great, thanks!

  • manolismanolis Member ✭✭

    Hi @cnorman,

    at the end unfortunately I can not test your suggestion. I have to wait the issue 5846.

    Many thanks

Sign In or Register to comment.