Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Error in GermlineCNVCaller: Anomalous ploidy and karyotypes

ngeraldngerald Member
edited May 31 in Ask the GATK team
I ran the GermlineCNVCaller on the GATK4 docker using data from Illumina WES runs:
I saw one post about this from 2018, but there were no solutions provided. Was hoping someone figured out what might be causing this issue. Is says that there were anomalous ploidy (3) and karyotypes found.
Am I required to provide separate contig-ploidy-priors files for male and female samples?
Is there a way to circumvent the errors and proceed with the CNV Calls?

Here is the error log on the terminal:
-------------------------------------------------------------------------------------------------------------------------------------------------------
```
[May 30, 2019 9:34:28 PM UTC] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 42.22 minutes.
Runtime.totalMemory()=309329920
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 137
Command Line: python /tmp/cohort_denoising_calling.2469363531812992060.py --ploidy_calls_path=/gatk/contig_ploidy_out/201to221-calls --output_calls_path=/gatk/cnv_caller_out/201to221-calls --output_tracking_path=/gatk/cnv_caller_out/201to221-tracking --modeling_interval_list=/tmp/intervals5455361474614479137.tsv --output_model_path=/gatk/cnv_caller_out/201to221-model --enable_explicit_gc_bias_modeling=False --read_count_tsv_files /tmp/sample-06897365439090412481.tsv /tmp/sample-14321148589333665336.tsv /tmp/sample-21835961979843646097.tsv /tmp/sample-38074673515857876969.tsv /tmp/sample-43743553031942260664.tsv /tmp/sample-57298179702079672321.tsv /tmp/sample-62031280085994514055.tsv /tmp/sample-75741767774624679683.tsv /tmp/sample-81194219972171310383.tsv /tmp/sample-97680992559618886592.tsv /tmp/sample-107437152082991706984.tsv /tmp/sample-111888210707192633556.tsv /tmp/sample-128036150598221044845.tsv /tmp/sample-138872009798693940440.tsv /tmp/sample-14940235851191146248.tsv /tmp/sample-154069286361387789329.tsv /tmp/sample-166690524464389231566.tsv /tmp/sample-17146091880304952416.tsv /tmp/sample-187389732363112723677.tsv /tmp/sample-192301262323034965667.tsv --psi_s_scale=1.000000e-04 --mapping_error_rate=1.000000e-02 --depth_correction_tau=1.000000e+04 --q_c_expectation_mode=hybrid --max_bias_factors=5 --psi_t_scale=1.000000e-03 --log_mean_bias_std=1.000000e-01 --init_ard_rel_unexplained_variance=1.000000e-01 --num_gc_bins=20 --gc_curve_sd=1.000000e+00 --active_class_padding_hybrid_mode=50000 --enable_bias_factors=True --disable_bias_factors_in_active_class=False --p_alt=1.000000e-06 --cnv_coherence_length=1.000000e+04 --max_copy_number=5 --p_active=0.010000 --class_coherence_length=10000.000000 --learning_rate=1.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.900000e-01 --log_emission_samples_per_round=50 --log_emission_sampling_rounds=10 --log_emission_sampling_median_rel_error=5.000000e-03 --max_advi_iter_first_epoch=5000 --max_advi_iter_subsequent_epochs=200 --min_training_epochs=10 --max_training_epochs=50 --initial_temperature=1.500000e+00 --num_thermal_advi_iters=2500 --convergence_snr_averaging_window=500 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=10 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=1.000000e+00 --disable_caller=false --disable_sampler=false --disable_annealing=false
Stdout: 20:52:35.963 INFO cohort_denoising_calling - Loading 20 read counts file(s)...
20:52:57.460 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata...
20:52:57.467 WARNING gcnvkernel.structs.metadata - Sample 219-Exp29_S115 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.468 WARNING gcnvkernel.structs.metadata - Sample 219-Exp29_S115 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.472 WARNING gcnvkernel.structs.metadata - Sample 220-Exp29_S116 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.472 WARNING gcnvkernel.structs.metadata - Sample 220-Exp29_S116 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.476 WARNING gcnvkernel.structs.metadata - Sample 208-Exp29_S104 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.476 WARNING gcnvkernel.structs.metadata - Sample 208-Exp29_S104 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.481 WARNING gcnvkernel.structs.metadata - Sample 221-Exp29_S117 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.481 WARNING gcnvkernel.structs.metadata - Sample 221-Exp29_S117 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.486 WARNING gcnvkernel.structs.metadata - Sample 213-Exp29_S109 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.486 WARNING gcnvkernel.structs.metadata - Sample 213-Exp29_S109 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.491 WARNING gcnvkernel.structs.metadata - Sample 201-Exp29_S97 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.491 WARNING gcnvkernel.structs.metadata - Sample 201-Exp29_S97 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.495 WARNING gcnvkernel.structs.metadata - Sample 209-Exp29_S105 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.495 WARNING gcnvkernel.structs.metadata - Sample 209-Exp29_S105 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.499 WARNING gcnvkernel.structs.metadata - Sample 203-Exp29_S99 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.499 WARNING gcnvkernel.structs.metadata - Sample 203-Exp29_S99 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.504 WARNING gcnvkernel.structs.metadata - Sample 212-Exp29_S108 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.504 WARNING gcnvkernel.structs.metadata - Sample 212-Exp29_S108 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.508 WARNING gcnvkernel.structs.metadata - Sample 204-Exp29_S100 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.509 WARNING gcnvkernel.structs.metadata - Sample 204-Exp29_S100 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.513 WARNING gcnvkernel.structs.metadata - Sample 218-Exp29_S114 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.513 WARNING gcnvkernel.structs.metadata - Sample 218-Exp29_S114 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.517 WARNING gcnvkernel.structs.metadata - Sample 215-Exp29_S111 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
20:52:57.517 WARNING gcnvkernel.structs.metadata - Sample 215-Exp29_S111 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unrelia
Stderr:
at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.executeGermlineCNVCallerPythonScript(GermlineCNVCaller.java:441)
at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:288)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
```
---------------------------------------------------------------------------------------------------------------------------------------------------------
Post edited by ngerald on

Best Answers

  • sleeslee ✭✭✭
    Accepted Answer

    Hi @ngerald, on a recent benchmarking run of 50 WES samples over ~220k target intervals, I ran with 45 shards of ~5k intervals each. Each shard was run on a GCE n1-standard-1 VM with 3.75GB memory, yielding a cost of ~0.5 cents per sample. With 20 samples and 16GB memory, you might be able to get away with shards of ~50k intervals (although I would probably try ~25k first).

    Sharding is only performed in the GermlineCNVCaller shard; i.e., the multiple interval files produced by IntervalListTools are only needed as -L input to each GermlineCNVCaller shard. These sharded results are then combined by PostprocessGermlineCNVCalls. So you can run CollectReadCounts over the original interval list. It might be worthwhile to study the WDL referenced by the tutorial to get an idea of how the workflow design dictates how the tools should be run.

Answers

  • ngeraldngerald Member
    Could it be because I'm using only 20 samples instead of 100+ ?
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ngerald

    I don't see any errors, all I see are Warnings which is fine. Did you get an output file?

  • ngeraldngerald Member
    Hi @bhanuGandham !

    There is an error towards the bottom:
    ```
    Stderr:
    at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
    at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.executeGermlineCNVCallerPythonScript(GermlineCNVCaller.java:441)
    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:288)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    '''

    There was no output since the python script exited with the exit code 137.
    I believe the warnings at the end resulted in the script failing.

    Q) Do I need to supply separate contig-ploidy-priors file for males and females, if my cohort has samples of both sexes?

    Q) Is there a way to let the script keep running even with the warnings?

    I could send you the entire process log file if required!
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ngerald

    I am sorry I missed that, our dev team is looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ngerald

    Would you please post the exact command you are using and the entire error log.

  • ngeraldngerald Member
    #COMMAND:
    '''
    gatk GermlineCNVCaller --run-mode COHORT -L my_data/Ancestry_004_probes_hg19.excludedline.bed --interval-merging-rule OVERLAPPING_ONLY --contig-ploidy-calls contig_ploidy_out/201to221-calls/ --input my_data/201.counts.tsv --input my_data/203.counts.tsv --input my_data/204.counts.tsv --input my_data/205.counts.tsv --input my_data/206.counts.tsv --input my_data/207.counts.tsv --input my_data/208.counts.tsv --input my_data/209.counts.tsv --input my_data/210.counts.tsv --input my_data/211.counts.tsv --input my_data/212.counts.tsv --input my_data/213.counts.tsv --input my_data/214.counts.tsv --input my_data/215.counts.tsv --input my_data/216.counts.tsv --input my_data/217.counts.tsv --input my_data/218.counts.tsv --input my_data/219.counts.tsv --input my_data/220.counts.tsv --input my_data/221.counts.tsv --output cnv_caller_out/ --output-prefix 201to221
    '''
    --------------------------------------------------------------------------------------------------------------------------------
    # COMPLETE LOG:

    '''
    Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.0.0-local.jar GermlineCNVCaller --run-mode COHORT -L my_data/Ancestry_004_probes_hg19.excludedline.bed --interval-merging-rule OVERLAPPING_ONLY --contig-ploidy-calls contig_ploidy_out/201to221-calls/ --input my_data/201.counts.tsv --input my_data/203.counts.tsv --input my_data/204.counts.tsv --input my_data/205.counts.tsv --input my_data/206.counts.tsv --input my_data/207.counts.tsv --input my_data/208.counts.tsv --input my_data/209.counts.tsv --input my_data/210.counts.tsv --input my_data/211.counts.tsv --input my_data/212.counts.tsv --input my_data/213.counts.tsv --input my_data/214.counts.tsv --input my_data/215.counts.tsv --input my_data/216.counts.tsv --input my_data/217.counts.tsv --input my_data/218.counts.tsv --input my_data/219.counts.tsv --input my_data/220.counts.tsv --input my_data/221.counts.tsv --output cnv_caller_out/ --output-prefix 201to221
    20:52:15.082 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    20:52:16.747 INFO GermlineCNVCaller - ------------------------------------------------------------
    20:52:16.748 INFO GermlineCNVCaller - The Genome Analysis Toolkit (GATK) v4.1.0.0
    20:52:16.748 INFO GermlineCNVCaller - Executing as [email protected] on Linux v4.9.125-linuxkit amd64
    20:52:16.749 INFO GermlineCNVCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
    20:52:16.749 INFO GermlineCNVCaller - Start Date/Time: May 30, 2019 8:52:15 PM UTC
    20:52:16.749 INFO GermlineCNVCaller - ------------------------------------------------------------
    20:52:16.749 INFO GermlineCNVCaller - ------------------------------------------------------------
    20:52:16.750 INFO GermlineCNVCaller - HTSJDK Version: 2.18.2
    20:52:16.750 INFO GermlineCNVCaller - Picard Version: 2.18.25
    20:52:16.750 INFO GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    20:52:16.750 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    20:52:16.750 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    20:52:16.750 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    20:52:16.751 INFO GermlineCNVCaller - Deflater: IntelDeflater
    20:52:16.751 INFO GermlineCNVCaller - Inflater: IntelInflater
    20:52:16.751 INFO GermlineCNVCaller - GCS max retries/reopens: 20
    20:52:16.751 INFO GermlineCNVCaller - Requester pays: disabled
    20:52:16.751 INFO GermlineCNVCaller - Initializing engine
    20:52:18.726 INFO GermlineCNVCaller - Done initializing engine
    log4j:WARN No appenders could be found for logger (org.broadinstitute.hdf5.HDF5Library).
    log4j:WARN Please initialize the log4j system properly.
    20:52:19.121 INFO GermlineCNVCaller - Intervals specified...
    20:52:19.394 INFO FeatureManager - Using codec BEDCodec to read file file:///gatk/my_data/Ancestry_004_probes_hg19.excludedline.bed
    20:52:20.265 INFO IntervalArgumentCollection - Processing 66890448 bp from intervals
    20:52:20.396 INFO GermlineCNVCaller - No annotated intervals were provided...
    20:52:20.397 INFO GermlineCNVCaller - No GC-content annotations for intervals found; explicit GC-bias correction will not be performed...
    20:52:20.636 INFO GermlineCNVCaller - Running the tool in the COHORT mode...
    20:52:20.637 INFO GermlineCNVCaller - Validating and aggregating data from input read-count files...
    20:52:20.737 INFO GermlineCNVCaller - Aggregating read-count file my_data/201.counts.tsv (1 / 20)
    20:52:21.601 INFO GermlineCNVCaller - Aggregating read-count file my_data/203.counts.tsv (2 / 20)
    20:52:22.174 INFO GermlineCNVCaller - Aggregating read-count file my_data/204.counts.tsv (3 / 20)
    20:52:22.882 INFO GermlineCNVCaller - Aggregating read-count file my_data/205.counts.tsv (4 / 20)
    20:52:23.325 INFO GermlineCNVCaller - Aggregating read-count file my_data/206.counts.tsv (5 / 20)
    20:52:24.049 INFO GermlineCNVCaller - Aggregating read-count file my_data/207.counts.tsv (6 / 20)
    20:52:24.456 INFO GermlineCNVCaller - Aggregating read-count file my_data/208.counts.tsv (7 / 20)
    20:52:25.175 INFO GermlineCNVCaller - Aggregating read-count file my_data/209.counts.tsv (8 / 20)
    20:52:25.601 INFO GermlineCNVCaller - Aggregating read-count file my_data/210.counts.tsv (9 / 20)
    20:52:26.320 INFO GermlineCNVCaller - Aggregating read-count file my_data/211.counts.tsv (10 / 20)
    20:52:26.722 INFO GermlineCNVCaller - Aggregating read-count file my_data/212.counts.tsv (11 / 20)
    20:52:27.199 INFO GermlineCNVCaller - Aggregating read-count file my_data/213.counts.tsv (12 / 20)
    20:52:27.665 INFO GermlineCNVCaller - Aggregating read-count file my_data/214.counts.tsv (13 / 20)
    20:52:28.446 INFO GermlineCNVCaller - Aggregating read-count file my_data/215.counts.tsv (14 / 20)
    20:52:28.862 INFO GermlineCNVCaller - Aggregating read-count file my_data/216.counts.tsv (15 / 20)
    20:52:29.321 INFO GermlineCNVCaller - Aggregating read-count file my_data/217.counts.tsv (16 / 20)
    20:52:29.992 INFO GermlineCNVCaller - Aggregating read-count file my_data/218.counts.tsv (17 / 20)
    20:52:30.514 INFO GermlineCNVCaller - Aggregating read-count file my_data/219.counts.tsv (18 / 20)
    20:52:30.911 INFO GermlineCNVCaller - Aggregating read-count file my_data/220.counts.tsv (19 / 20)
    20:52:31.679 INFO GermlineCNVCaller - Aggregating read-count file my_data/221.counts.tsv (20 / 20)
    21:34:28.219 INFO GermlineCNVCaller - Shutting down engine
    [May 30, 2019 9:34:28 PM UTC] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 42.22 minutes.
    Runtime.totalMemory()=309329920
    org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
    python exited with 137
    Command Line: python /tmp/cohort_denoising_calling.2469363531812992060.py --ploidy_calls_path=/gatk/contig_ploidy_out/201to221-calls --output_calls_path=/gatk/cnv_caller_out/201to221-calls --output_tracking_path=/gatk/cnv_caller_out/201to221-tracking --modeling_interval_list=/tmp/intervals5455361474614479137.tsv --output_model_path=/gatk/cnv_caller_out/201to221-model --enable_explicit_gc_bias_modeling=False --read_count_tsv_files /tmp/sample-06897365439090412481.tsv /tmp/sample-14321148589333665336.tsv /tmp/sample-21835961979843646097.tsv /tmp/sample-38074673515857876969.tsv /tmp/sample-43743553031942260664.tsv /tmp/sample-57298179702079672321.tsv /tmp/sample-62031280085994514055.tsv /tmp/sample-75741767774624679683.tsv /tmp/sample-81194219972171310383.tsv /tmp/sample-97680992559618886592.tsv /tmp/sample-107437152082991706984.tsv /tmp/sample-111888210707192633556.tsv /tmp/sample-128036150598221044845.tsv /tmp/sample-138872009798693940440.tsv /tmp/sample-14940235851191146248.tsv /tmp/sample-154069286361387789329.tsv /tmp/sample-166690524464389231566.tsv /tmp/sample-17146091880304952416.tsv /tmp/sample-187389732363112723677.tsv /tmp/sample-192301262323034965667.tsv --psi_s_scale=1.000000e-04 --mapping_error_rate=1.000000e-02 --depth_correction_tau=1.000000e+04 --q_c_expectation_mode=hybrid --max_bias_factors=5 --psi_t_scale=1.000000e-03 --log_mean_bias_std=1.000000e-01 --init_ard_rel_unexplained_variance=1.000000e-01 --num_gc_bins=20 --gc_curve_sd=1.000000e+00 --active_class_padding_hybrid_mode=50000 --enable_bias_factors=True --disable_bias_factors_in_active_class=False --p_alt=1.000000e-06 --cnv_coherence_length=1.000000e+04 --max_copy_number=5 --p_active=0.010000 --class_coherence_length=10000.000000 --learning_rate=1.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.900000e-01 --log_emission_samples_per_round=50 --log_emission_sampling_rounds=10 --log_emission_sampling_median_rel_error=5.000000e-03 --max_advi_iter_first_epoch=5000 --max_advi_iter_subsequent_epochs=200 --min_training_epochs=10 --max_training_epochs=50 --initial_temperature=1.500000e+00 --num_thermal_advi_iters=2500 --convergence_snr_averaging_window=500 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=10 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=1.000000e+00 --disable_caller=false --disable_sampler=false --disable_annealing=false
    Stdout: 20:52:35.963 INFO cohort_denoising_calling - Loading 20 read counts file(s)...
    20:52:57.460 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata...
    20:52:57.467 WARNING gcnvkernel.structs.metadata - Sample 219-Exp29_S115 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.468 WARNING gcnvkernel.structs.metadata - Sample 219-Exp29_S115 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.472 WARNING gcnvkernel.structs.metadata - Sample 220-Exp29_S116 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.472 WARNING gcnvkernel.structs.metadata - Sample 220-Exp29_S116 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.476 WARNING gcnvkernel.structs.metadata - Sample 208-Exp29_S104 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.476 WARNING gcnvkernel.structs.metadata - Sample 208-Exp29_S104 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.481 WARNING gcnvkernel.structs.metadata - Sample 221-Exp29_S117 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.481 WARNING gcnvkernel.structs.metadata - Sample 221-Exp29_S117 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.486 WARNING gcnvkernel.structs.metadata - Sample 213-Exp29_S109 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.486 WARNING gcnvkernel.structs.metadata - Sample 213-Exp29_S109 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.491 WARNING gcnvkernel.structs.metadata - Sample 201-Exp29_S97 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.491 WARNING gcnvkernel.structs.metadata - Sample 201-Exp29_S97 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.495 WARNING gcnvkernel.structs.metadata - Sample 209-Exp29_S105 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.495 WARNING gcnvkernel.structs.metadata - Sample 209-Exp29_S105 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.499 WARNING gcnvkernel.structs.metadata - Sample 203-Exp29_S99 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.499 WARNING gcnvkernel.structs.metadata - Sample 203-Exp29_S99 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.504 WARNING gcnvkernel.structs.metadata - Sample 212-Exp29_S108 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.504 WARNING gcnvkernel.structs.metadata - Sample 212-Exp29_S108 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.508 WARNING gcnvkernel.structs.metadata - Sample 204-Exp29_S100 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.509 WARNING gcnvkernel.structs.metadata - Sample 204-Exp29_S100 has an anomalous karyotype ({'Y': 1, 'X': 2}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.513 WARNING gcnvkernel.structs.metadata - Sample 218-Exp29_S114 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.513 WARNING gcnvkernel.structs.metadata - Sample 218-Exp29_S114 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.517 WARNING gcnvkernel.structs.metadata - Sample 215-Exp29_S111 has an anomalous ploidy (3) for contig 19. The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unreliable ploidy designations. It is recommended that the user verifies this designation by orthogonal methods.
    20:52:57.517 WARNING gcnvkernel.structs.metadata - Sample 215-Exp29_S111 has an anomalous karyotype ({'Y': 0, 'X': 3}). The presence of unmasked PAR regions and regions of low mappability in the coverage metadata can result in unrelia
    Stderr:
    at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
    at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.executeGermlineCNVCallerPythonScript(GermlineCNVCaller.java:441)
    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:288)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    '''
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @ngerald

    Looks like the trouble is with the way the intervals are sharded, which is causing you to run out of memory. Take a look at this tutorial for information on interval sharding: https://software.broadinstitute.org/gatk/documentation/article?id=11684

  • ngeraldngerald Member
    Hi @bhanuGandham,
    Thanks for the heads-up. Will check this out!
  • ngeraldngerald Member
    Thanks @slee and @bhanuGandham!

    If I use the scatter option from IntervalListTools, how large would you suggest I set the 'scatter content' to, given that I have about 420k intervals and 16Gb memory?

    Also if scattered into multiple interval files, would I need to run each of them individually through CollectReadCounts and GermlineCNVCaller and then merge all results? Is there a way to specify the folder containing the interval files?
  • sleeslee Member, Broadie, Dev ✭✭✭
    Accepted Answer

    Hi @ngerald, on a recent benchmarking run of 50 WES samples over ~220k target intervals, I ran with 45 shards of ~5k intervals each. Each shard was run on a GCE n1-standard-1 VM with 3.75GB memory, yielding a cost of ~0.5 cents per sample. With 20 samples and 16GB memory, you might be able to get away with shards of ~50k intervals (although I would probably try ~25k first).

    Sharding is only performed in the GermlineCNVCaller shard; i.e., the multiple interval files produced by IntervalListTools are only needed as -L input to each GermlineCNVCaller shard. These sharded results are then combined by PostprocessGermlineCNVCalls. So you can run CollectReadCounts over the original interval list. It might be worthwhile to study the WDL referenced by the tutorial to get an idea of how the workflow design dictates how the tools should be run.

  • ngeraldngerald Member
  • ngeraldngerald Member
    Just to follow-up:
    Splitting the 420k intervals into 17 shards of ~25k intervals each seems to work well!

    Thanks for the help!
Sign In or Register to comment.