CNNScoreVariants, too much threads

Hi,

in the BestPractice workflows you advise to use HaplotypeCaller with the "-XX:GCTimeLimit=50" and "-X:GCHeapFreeLimit=10" java options.

Is there something similar for CNNScoreVariants? I tried to use several java options with different values to limit threads but it is quite impossible. Without any option I have 116 threads, running only one command, with 5 java options I can limit them to 95 ... still too much! What should I limit here?

Many thanks

Best Answer

Answers

  • manolismanolis Member ✭✭
    edited February 25

    Just, in my case considering the set up of our server I cann't use Spark and WDL. In general, I have installed GATK v4.1.0.0, linux server, bash pipeline.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @manolis

    The 116 threads you mention, is it Java or Python threads? How are you concluding there are 116 threads and which tool are you using to determine this?

  • manolismanolis Member ✭✭

    Hi @bhanuGandham I have to check this, good question! I used 'htop' and I checked the number of threads before, during and after the job running alone. I will update you! Thanks

  • manolismanolis Member ✭✭

    Hi @bhanuGandham I checked it (without any Java option/limitation):

    Total threads: 116
        Java = 51
       Python = 65
    

    With some java options I can decrease the java threads from 51 to 31... Is there any similar option for python and in case of "yes" where can I add those options in yours CNNScoreVariants code or I have to act directly to the python level?

    Many thanks for the help!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited March 11

    HI @manolis

    For controlling the python threads, you can try using inter-op-thread advanced argument for CNNScoreVariants.
    Please let us know if this resolves the issue. Would be good information for other users too.

  • manolismanolis Member ✭✭
    edited March 25

    Hi, I tested the option "inter-op-thread" but in my case the number of threads has not changed.
    If I use a host with 30 cores the CNN tool generates around 116 total threads. If I use one with 62 cores it generates around 203 total threads.

    The "inter-op-thread" values that I tested are: 1, 2, 5, 10, 30, 60. I tried to test also the "--intra-op-threads" but nothing to do.

    Any other user has the same problem? ... hoping with a solution ...

    Sorry for my delay answer. Thanks @bhanuGandham

    Here is the code:

    source activate gatk4100
    
    /share/apps/bio/gatk-4.1.0.0/gatk CNNScoreVariants \
    --inter-op-threads ${a} \
    --intra-op-threads ${b} \
    -R ${hg38} \
    -I ${input.bam} \
    -V ${input.vcf} \
    -O ${output.cnn2.vcf} \
    -L ${interval} \
    --inference-batch-size 8 \
    --transfer-batch-size 32 \
    --tensor-type read_tensor \
    --tmp-dir ${tmp}
    
    conda deactivate
    
  • cnormancnorman United StatesMember, Broadie, Dev ✭✭
    edited March 29

    Hi @manolis. One other thing to experiment with would be trying some small values for the environment variable OMP_NUM_THREADS (i.e. export OMP_NUM_THREADS=1, or 2). I was able to reduce the number of threads used in the Python process this way on my local device. If that works, I would recommend caution when running other GATK tools, as it could impact other GATK components. Please let us know if that works, I'd be interested in knowing the results.

  • manolismanolis Member ✭✭

    Many thanks @cnorman

    I will try and I will let you know! Just I need some weeks to give you an answer, I do not have access at root level to the server and I have to ask to our support.

    Thanks

  • manolismanolis Member ✭✭

    great, thanks!

  • manolismanolis Member ✭✭

    Hi @cnorman,

    at the end unfortunately I can not test your suggestion. I have to wait the issue 5846.

    Many thanks

Sign In or Register to comment.