We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

DetermineGermlineContigPloidy- Issue when using more samples

Hello I have been testing the gCNVcaller from GATK 4.1.0.0

I was able to test and complete the gCNV pipeline using 30 samples. But I would like to scale up to a larger dataset of 200 samples, and am having trouble. The DetermineGermlineContigPloidy function is giving me errors when I try to use 200 samples. i have tried dividing it up by chromosome for each run, and have still been unable to diagnose a solution.

gatk --java-options "-Xmx25G" DetermineGermlineContigPloidy \
-I CVH-1051.bam.counts.hdf5 \
.... (other 199 samples)
--contig-ploidy-priors ./contig_priors.tsv \
--output ../output/output.gatk.DGCP/ \
--output-prefix test_Data \
-verbosity DEBUG


12:21:12.899 DEBUG ScriptExecutor - --interval_list=/tmp/intervals1902729868733878616.tsv
12:21:12.899 DEBUG ScriptExecutor - --contig_ploidy_prior_table=/gpfs/gsfs10/users/islekda/projectCNV/gatk/contig_priors.tsv
12:21:12.899 DEBUG ScriptExecutor - --output_model_path=/gpfs/gsfs10/users/islekda/projectCNV/output/output.gatk.DGCP/test_Data-model
/data/islekda/conda/envs/gatk/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/tmp/cohort_determine_ploidy_and_depth.4716223482773228095.py", line 86, in <module>
sample_metadata_collection, args.sample_coverage_metadata)
File "/data/islekda/conda/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_metadata.py", line 78, in read_sample_coverage_metadata
sample_name, n_j, contig_list))
File "/data/islekda/conda/envs/gatk/lib/python3.6/site-packages/gcnvkernel/structs/metadata.py", line 242, in add_sample_coverage_metadata
'Sample "{0}" already has coverage metadata annotations'.format(sample_name))
gcnvkernel.structs.metadata.SampleAlreadyInCollectionException: Sample "none" already has coverage metadata annotations
12:21:36.494 DEBUG ScriptExecutor - Result: 1
12:21:36.495 INFO DetermineGermlineContigPloidy - Shutting down engine
[March 11, 2019 12:21:36 PM EDT] org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy done. Elapsed time: 1.15 minutes.
Runtime.totalMemory()=3002597376
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1
Command Line: python /tmp/cohort_determine_ploidy_and_depth.4716223482773228095.py --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig5215066494113015797.tsv --output_calls_path=/gpfs/gsfs10/users/islekda/projectCNV/output/output.gatk.DGCP/test_Data-calls --mapping_error_rate=1.000000e-02 --psi_s_scale=1.000000e-04 --mean_bias_sd=1.000000e-02 --psi_j_scale=1.000000e-03 --learning_rate=5.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.990000e-01 --log_emission_samples_per_round=2000 --log_emission_sampling_rounds=100 --log_emission_sampling_median_rel_error=5.000000e-04 --max_advi_iter_first_epoch=1000 --max_advi_iter_subsequent_epochs=1000 --min_training_epochs=20 --max_training_epochs=100 --initial_temperature=2.000000e+00 --num_thermal_advi_iters=5000 --convergence_snr_averaging_window=5000 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=1 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=7.500000e-01 --disable_caller=false --disable_sampler=false --disable_annealing=false --interval_list=/tmp/intervals1902729868733878616.tsv --contig_ploidy_prior_table=/gpfs/gsfs10/users/islekda/projectCNV/gatk/contig_priors.tsv --output_model_path=/gpfs/gsfs10/users/islekda/projectCNV/output/output.gatk.DGCP/test_Data-model
at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.executeDeterminePloidyAndDepthPythonScript(DetermineGermlineContigPloidy.java:403)
at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.doWork(DetermineGermlineContigPloidy.java:283)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)

Best Answer

Answers

  • dislekdislek Member
    After checking, yes there are multiple files who have the SM tag "none", but there sample names are supplied elsewhere. How would you suggest I remedy this?
  • sleeslee Member, Broadie, Dev ✭✭✭

    If you don't want to reheader your BAMs, you could insert the correct sample names into the count files produced by the CollectReadCounts step. There are python libraries that you can use to edit HDF5 files (e.g., h5py), but you might find it easier to recollect coverage and use--format TSV to output to plaintext TSV files.

Sign In or Register to comment.