GermlineCNVCaller: transition matrix priors

wdecosterwdecoster University of AntwerpMember

Hi,

I'm giving the Beta GermlineCNVCaller a spin and downloaded the CN_transition_matrix files from the GATK bundle at

ftp.broadinstitute.org/bundle/beta/GermlineCNVCaller/copyNumberTransitionPriors/CN_transition_matrix_XX_X.tsv
(including the other files:
CN_transition_matrix_autosomal.tsv
CN_transition_matrix_XX_X.tsv
CN_transition_matrix_XX_Y.tsv
CN_transition_matrix_XY_X.tsv
CN_transition_matrix_XY_Y.tsv)

For example, the CN_transition_matrix_XX_X.tsv looks like this:

T_MATRIX_XX_X   FROM_0  FROM_1  FROM_2  FROM_3  FROM_4
TO_0    0.99966751443861601     0.0     4.6641935242897276e-08  0.0     0.0
TO_1    0.0     0.9997899779920747      9.3423238818193651e-08  0.0     0.0
TO_2    0.00033248556138398773  0.00021002200792527603  0.99999985473174158     4.541929579905158e-05   7.8833267638943636e-05
TO_3    0.0     0.0     5.0172599663674365e-09  0.99995458070420096     0.0
TO_4    0.0     0.0     1.8582444319879394e-10  0.0     0.99992116673236109

I have absolutely no idea which values in these files are reasonable, so that's why I wanted to use the example files.
But when I run the GermlineCNVCaller I receive the following output, error and stacktrace:

Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx16g -jar /home/wdecoster/bin/gatk-4.beta.5/gatk-package-4.beta.5-local.jar GermlineCNVCaller --jobType LEARN_AND_CALL --input test_targetcoverage --contigAnnotationsTable hg19_contig_annotations.tsv --copyNumberTransitionPriorTable hg19_germline_CN_priors.old --outputPath test_results --sexGenotypeTable test.sexgenotype --targets processed_targets_mod.tsv --disableSpark true --rddCheckpointing false
20:11:54.679 WARN  SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly
20:11:56.274 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/wdecoster/bin/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_compression.so
[November 14, 2017 8:11:56 PM CET] GermlineCNVCaller  --input test_targetcoverage --contigAnnotationsTable hg19_contig_annotations.tsv --sexGenotypeTable test.sexgenotype --copyNumberTransitionPriorTable hg19_germline_CN_priors.old --outputPath test_results --jobType LEARN_AND_CALL --rddCheckpointing false --targets processed_targets_mod.tsv --disableSpark true  --targetSpecificVarianceUpperLimit 5.0 --sampleSpecificVarianceUpperLimit 1.0 --sampleSpecificVarianceSolverNumBisections 10 --targetSpecificVarianceSolverNumBisections 10 --sampleSpecificVarianceSolverRefinementDepth 3 --targetSpecificVarianceSolverRefinementDepth 3 --maximumEMIterations 50 --numLatents 5 --logLikelihoodTol 1.0E-5 --paramAbsoluteTolerance 1.0E-5 --posteriorAbsoluteTolerance 0.01 --maximumMStepCycles 1 --maximumEStepCycles 1 --logLikelihoodTolThresholdCopyRatioCalling 0.01 --logLikelihoodTolThresholdTargetSpecificVarianceSwitching 0.01 --logLikelihoodTolThresholdARDUpdate 0.01 --minimumLearningReadCount 5 --minimumPCAInitializationReadCount 10 --mappingErrorRate 0.001 --fourierRegularization false --minimumCNVLength 10 --maximumCNVLength 1000 --zeroPadFFT false --fourierRegularizationStrength 10000.0 --targetSpecificVarianceUpdateMode TARGET_RESOLVED --biasCovariateSolverType SPARK --copyRatioHMMType SPARK --maxTargetSpecificVarianceIterations 200 --maxBiasCovariatesIterations 200 --maxSampleSpecificVarianceIterations 200 --biasCovariatesAbsoluteTolerance 1.0E-8 --biasCovariatesRelativeTolerance 1.0E-6 --targetSpecificVarianceAbsoluteTolerance 1.0E-8 --targetSpecificVarianceRelativeTolerance 1.0E-6 --sampleSpecificVarianceAbsoluteTolerance 1.0E-8 --sampleSpecificVarianceRelativeTolerance 1.0E-6 --rddCheckpointingInterval 10 --runCheckpointingInterval 1 --biasCovariatesCommunicationPolicy RDD_JOIN --meanFieldAdmixingRatio 1.0 --runCheckpointing false --sampleSpecificVarianceUpdate true --targetSpecificVarianceUpdate true --adaptiveTargetSpecificVarianceUpdateModeSwitching false --copyRatioUpdate true --rddCheckpointingPath /dev/null --runCheckpointingPath /dev/null --extendedPosteriorOutputEnabled true --numTargetSpacePartitions 1 --ARDUpdate true --initialARDPrecisionAbsolute 1.0E-8 --initialARDPrecisionRelativeToNoise 0.001 --maxARDPrecision 1.0E100 --targetSpecificVarianceSolverNumThreads 1 --modelInitializationStrategy PCA --includeARDInLogLikelihood false --sparkMaster local[*] --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20
[November 14, 2017 8:11:56 PM CET] Executing as [email protected] on Linux 2.6.32-642.15.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Version: 4.beta.5
20:11:57.173 INFO  GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
20:11:57.174 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
20:11:57.174 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20:11:57.175 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
20:11:57.175 INFO  GermlineCNVCaller - Deflater: IntelDeflater
20:11:57.175 INFO  GermlineCNVCaller - Inflater: IntelInflater
20:11:57.175 INFO  GermlineCNVCaller - GCS max retries/reopens: 20
20:11:57.175 INFO  GermlineCNVCaller - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
20:11:57.175 INFO  GermlineCNVCaller - Initializing engine
20:11:57.175 INFO  GermlineCNVCaller - Done initializing engine
20:11:57.176 INFO  GermlineCNVCaller - Spark disabled.  sparkMaster option (local[*]) ignored.
20:12:54.643 INFO  GermlineCNVCaller - Parsing the read counts table...
20:12:56.399 INFO  GermlineCNVCaller - Parsing the sample sex genotypes table...
20:12:56.418 INFO  GermlineCNVCaller - Parsing the germline contig ploidy annotation table...
20:12:56.423 INFO  ContigGermlinePloidyAnnotationTableReader - Ploidy tags: SEX_XX, SEX_XY
20:12:57.517 INFO  GermlineCNVCaller - Parsing the copy number transition prior table and initializing the caches...
20:12:57.519 INFO  GermlineCNVCaller - Shutting down engine
[November 14, 2017 8:12:57 PM CET] org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller done. Elapsed time: 1.02 minutes.
Runtime.totalMemory()=1665138688
java.lang.IllegalArgumentException: The parent path of the integer copy number transition matrix collection table file must be non-null
    at org.broadinstitute.hellbender.utils.Utils.nonNull(Utils.java:554)
    at org.broadinstitute.hellbender.tools.exome.germlinehmm.IntegerCopyNumberTransitionMatrixCollection$IntegerCopyNumberTransitionMatrixCollectionReader.read(IntegerCopyNumberTransitionMatrixCollection.java:266)
    at org.broadinstitute.hellbender.tools.exome.germlinehmm.IntegerCopyNumberTransitionMatrixCollection.read(IntegerCopyNumberTransitionMatrixCollection.java:131)
    at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.createIntegerCopyNumberTransitionProbabilityCacheCollection(GermlineCNVCaller.java:361)
    at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.runPipeline(GermlineCNVCaller.java:304)
    at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.doWork(GermlineCNVCaller.java:258)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
    at org.broadinstitute.hellbender.Main.main(Main.java:233)

Do you have suggestions on how to fix this?

Tagged:

Issue · Github
by Sheila

Issue Number
2687
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Best Answers

  • shleeshlee Cambridge admin
    Accepted Answer

    Hi @wdecoster ,

    For --copyNumberTransitionPriorTable try using the absolute path or add ./ prior to the filename.

  • shleeshlee Cambridge admin
    Accepted Answer

    Hi @wdecoster,

    For your test.sexgenotype, I think perhaps the SEX_GENOTYPE should be chrXchrX. Or alternatively, the hg19_contig_annotations.tsv's contig naming scheme should drop the 'chr'. Code will take things rather literally.

    If this doesn't solve the issue, please let us know. I am able to change your designation of the former answer. Thanks for the feedback that the absolute path solved the previous issue.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @wdecoster
    Hi,

    I will check with the team and get back to you.

    -Sheila

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    Accepted Answer

    Hi @wdecoster ,

    For --copyNumberTransitionPriorTable try using the absolute path or add ./ prior to the filename.

  • wdecosterwdecoster University of AntwerpMember
    edited November 2017

    @shlee Using the absolute path solved this issue, thanks! Odd, since the file was right there in the current directory. I wrongly clicked "No" on the button "Did this answer the question?" - Sorry for that!

    I come now across another error message, which I find a bit cryptic. It's probably self-explanatory, but not obvious to me right now:

    Some of the sample sex genotypes do not have germline contig ploidy annotations: XX
    (full trace below)

    My "hg19_contig_annotations.tsv" looks like:

    [[email protected] GermlineCNVCaller]$ less hg19_contig_annotations.tsv
    CONTIG  CLASS   SEX_XX  SEX_XY
    chr1    AUTOSOMAL       2       2
    chr2    AUTOSOMAL       2       2
    chr3    AUTOSOMAL       2       2
    chr4    AUTOSOMAL       2       2
    chr5    AUTOSOMAL       2       2
    chr6    AUTOSOMAL       2       2
    chr7    AUTOSOMAL       2       2
    chr8    AUTOSOMAL       2       2
    chr9    AUTOSOMAL       2       2
    chr10   AUTOSOMAL       2       2
    chr11   AUTOSOMAL       2       2
    chr12   AUTOSOMAL       2       2
    chr13   AUTOSOMAL       2       2
    chr14   AUTOSOMAL       2       2
    chr15   AUTOSOMAL       2       2
    chr16   AUTOSOMAL       2       2
    chr17   AUTOSOMAL       2       2
    chr18   AUTOSOMAL       2       2
    chr19   AUTOSOMAL       2       2
    chr20   AUTOSOMAL       2       2
    chr21   AUTOSOMAL       2       2
    chr22   AUTOSOMAL       2       2
    chrX    ALLOSOMAL       2       1
    chrY    ALLOSOMAL       0       1
    

    And my test.sexgenotype (small test dataset):

    SAMPLE_NAME     SEX_GENOTYPE
    D10012_8_66000_hg19s    XX
    D10406_3_66001_hg19s    XX
    

    Can you spot something wrong?

    Cheers,
    Wouter

    Full trace:

    22:07:51.963 INFO  GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    22:07:51.963 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    22:07:51.963 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    22:07:51.963 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    22:07:51.963 INFO  GermlineCNVCaller - Deflater: IntelDeflater
    22:07:51.963 INFO  GermlineCNVCaller - Inflater: IntelInflater
    22:07:51.963 INFO  GermlineCNVCaller - GCS max retries/reopens: 20
    22:07:51.963 INFO  GermlineCNVCaller - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    22:07:51.963 INFO  GermlineCNVCaller - Initializing engine
    22:07:51.963 INFO  GermlineCNVCaller - Done initializing engine
    22:07:51.964 INFO  GermlineCNVCaller - Spark disabled.  sparkMaster option (local[*]) ignored.
    22:07:53.249 INFO  GermlineCNVCaller - Parsing the read counts table...
    22:07:54.726 INFO  GermlineCNVCaller - Parsing the sample sex genotypes table...
    22:07:54.743 INFO  GermlineCNVCaller - Parsing the germline contig ploidy annotation table...
    22:07:54.747 INFO  ContigGermlinePloidyAnnotationTableReader - Ploidy tags: SEX_XX, SEX_XY
    22:07:55.704 INFO  GermlineCNVCaller - Parsing the copy number transition prior table and initializing the caches...
    log4j:WARN No appenders could be found for logger (org.nd4j.nativeblas.NativeOps).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    22:07:59.397 INFO  GermlineCNVCaller - Initializing the EM algorithm workspace...
    22:08:00.123 INFO  GermlineCNVCaller - Shutting down engine
    [November 22, 2017 10:08:00 PM CET] org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller done. Elapsed time: 0.14 minutes.
    Runtime.totalMemory()=2702704640
    java.lang.IllegalArgumentException: Some of the sample sex genotypes do not have germline contig ploidy annotations: XX
            at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.validateWorkspaceArgs(CoverageModelEMWorkspace.java:469)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.<init>(CoverageModelEMWorkspace.java:314)
            at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.runPipeline(GermlineCNVCaller.java:318)
            at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.doWork(GermlineCNVCaller.java:258)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
            at org.broadinstitute.hellbender.Main.main(Main.java:233)
    
  • shleeshlee CambridgeMember, Broadie, Moderator admin
    Accepted Answer

    Hi @wdecoster,

    For your test.sexgenotype, I think perhaps the SEX_GENOTYPE should be chrXchrX. Or alternatively, the hg19_contig_annotations.tsv's contig naming scheme should drop the 'chr'. Code will take things rather literally.

    If this doesn't solve the issue, please let us know. I am able to change your designation of the former answer. Thanks for the feedback that the absolute path solved the previous issue.

  • wdecosterwdecoster University of AntwerpMember

    @shlee thanks indeed, it turns out that it should have been chrXchrX. I didn't expect that this would get parsed by the tool and assumed this was just a gender identifier and I could also have chosen Alice and Bob. I don't want to know how many developer/user hours have been wasted on "databases not agreeing on chromosome identifiers". Anyway, I changed this to chrXchrX and chrXchrY there and everywhere I could find something similar.

    I wish I could tell you it works now, but unfortunately, I came to a new error:
    org.broadinstitute.hellbender.exceptions.GATKException: Nd4j data type must be set to double for coverage modeller routines to function properly. This can be done by setting JVM system property "dtype" to "double". Can not continue. But it turns out this was already reported and adding -Ddtype=double solved this. but the error message wasn't very clear on this (I first tried -dtype=double and then turned to googling).

    Next error is: java.lang.IllegalArgumentException: Number of bias latent variables must be strictly less than the number of samples (full stack below).

    Do you have a suggestion on how to fix this one? Is this because I just took a silly test set of 2 samples?

    Executing as [email protected] on Linux 2.6.32-642.15.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b15; Version: 4.beta.5
    15:33:36.994 INFO  GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    15:33:36.994 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    15:33:36.994 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    15:33:36.995 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    15:33:36.995 INFO  GermlineCNVCaller - Deflater: IntelDeflater
    15:33:36.995 INFO  GermlineCNVCaller - Inflater: IntelInflater
    15:33:36.995 INFO  GermlineCNVCaller - GCS max retries/reopens: 20
    15:33:36.995 INFO  GermlineCNVCaller - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    15:33:36.995 INFO  GermlineCNVCaller - Initializing engine
    15:33:36.995 INFO  GermlineCNVCaller - Done initializing engine
    15:33:36.996 INFO  GermlineCNVCaller - Spark disabled.  sparkMaster option (local[*]) ignored.
    15:34:00.539 INFO  GermlineCNVCaller - Parsing the read counts table...
    15:34:02.196 INFO  GermlineCNVCaller - Parsing the sample sex genotypes table...
    15:34:02.218 INFO  GermlineCNVCaller - Parsing the germline contig ploidy annotation table...
    15:34:02.224 INFO  ContigGermlinePloidyAnnotationTableReader - Ploidy tags: chrXchrY, chrXchrX
    15:34:03.358 INFO  GermlineCNVCaller - Parsing the copy number transition prior table and initializing the caches...
    log4j:WARN No appenders could be found for logger (org.nd4j.nativeblas.NativeOps).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    15:34:37.634 INFO  GermlineCNVCaller - Initializing the EM algorithm workspace...
    15:34:46.121 INFO  CoverageModelEMWorkspace - No column dropped due to bad values
    15:34:46.828 INFO  CoverageModelEMWorkspace - Some targets dropped (3854 out of 348298, 1.11%) as they were flagged more than 1 times across 2 samples as being masked for learning.
    15:34:52.403 INFO  CoverageModelEMWorkspace - Number of samples before and after pre-processing read counts: (2, 2)
    15:34:52.403 INFO  CoverageModelEMWorkspace - Number of targets before and after pre-processing read counts: (348298, 344444)
    15:34:52.406 INFO  CoverageModelEMWorkspace - Initializing a local compute block
    15:34:52.437 INFO  CoverageModelEMWorkspace - Collecting coverage data to the driver node...
    15:35:23.745 INFO  CoverageModelEMWorkspace - Pushing coverage data to worker(s)...
    15:35:24.092 INFO  CoverageModelEMWorkspace - Calculating copy ratio priors on the driver node...
    15:35:29.134 INFO  CoverageModelEMWorkspace - Pushing posteriors to worker(s)...
    15:35:29.242 INFO  GermlineCNVCaller - Shutting down engine
    [December 4, 2017 3:35:29 PM CET] org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller done. Elapsed time: 1.87 minutes.
    Runtime.totalMemory()=3118989312
    java.lang.IllegalArgumentException: Number of bias latent variables must be strictly less than the number of samples
            at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.initializeWorkerModelParameters(CoverageModelEMWorkspace.java:731)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.<init>(CoverageModelEMWorkspace.java:422)
            at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.runPipeline(GermlineCNVCaller.java:318)
            at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.doWork(GermlineCNVCaller.java:258)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
            at org.broadinstitute.hellbender.Main.main(Main.java:233)
    
  • wdecosterwdecoster University of AntwerpMember

    Thanks, I'll start again with more samples. I went for two samples for testing if my configuration files and commands were correct - glad I could already iron out some mistakes with those.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @wdecoster, let me know if you are analyzing exome data and if you are then interested in the parameters I set for my panel of normals.

  • wdecosterwdecoster University of AntwerpMember

    I am indeed using exome sequencing data (625 samples). But all of those are patients, so I use --jobType LEARN_AND_CALL, so without a PoN. But I expect rare CNVs - not all patients have the same disease. It would be problematic to select "normal samples". Does this sound reasonable or did I misunderstand something?

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @wdecoster, so long as you expect each patient sample to exhibit unique events, then all together your cohort should serve well enough as normal samples. Any outliers in the data collection (sample-wise or target-wise) will be trimmed, i.e. they should not contribute to the panel of normals. The --jobType LEARN_AND_CALL will concurrently create your PoN and call on each of the samples. You can subsequently use this PoN in the other mode of calling only for new samples.

  • wdecosterwdecoster University of AntwerpMember

    Hi @shlee, happy new year to you and the rest of the team.

    After running quite a long time I now encounter the Java error printed below, do you have suggestions on how to solve this?

    The server I'm using has 500Gb RAM, but perhaps I need to increase one of the ulimits?

    Runtime.totalMemory()=37407948800
    Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate 37663204028 + 1741475000 bytes (> Pointer.maxBytes)
            at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:486)
            at org.bytedeco.javacpp.Pointer.init(Pointer.java:120)
            at org.bytedeco.javacpp.DoublePointer.allocateArray(Native Method)
            at org.bytedeco.javacpp.DoublePointer.<init>(DoublePointer.java:68)
            at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:452)
            at org.nd4j.linalg.api.buffer.DoubleBuffer.<init>(DoubleBuffer.java:52)
            at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.createDouble(DefaultDataBufferFactory.java:228)
            at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1229)
            at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:243)
            at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:124)
            at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.createUninitialized(CpuNDArrayFactory.java:252)
            at org.nd4j.linalg.factory.Nd4j.createUninitialized(Nd4j.java:4357)
            at org.nd4j.linalg.api.shape.Shape.toOffsetZeroCopyHelper(Shape.java:152)
            at org.nd4j.linalg.api.shape.Shape.toOffsetZeroCopy(Shape.java:108)
            at org.nd4j.linalg.api.ndarray.BaseNDArray.dup(BaseNDArray.java:1473)
            at org.nd4j.linalg.api.ndarray.BaseNDArray.sub(BaseNDArray.java:2547)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMComputeBlock.getReadDepthLatentPosteriorData(CoverageModelEMComputeBlock.java:593)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.lambda$updateReadDepthPosteriorExpectations$c0e4deb5$1(CoverageModelEMWorkspace.java:1095)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.mapWorkersAndReduce(CoverageModelEMWorkspace.java:2192)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.updateReadDepthPosteriorExpectations(CoverageModelEMWorkspace.java:1094)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMWorkspace.updateReadDepthPosteriorExpectations(CoverageModelEMWorkspace.java:1137)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMAlgorithm.updateReadDepthLatentPosteriorExpectations(CoverageModelEMAlgorithm.java:426)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMAlgorithm.runRoutine(CoverageModelEMAlgorithm.java:108)
            at org.broadinstitute.hellbender.tools.coveragemodel.CoverageModelEMAlgorithm.runExpectationMaximization(CoverageModelEMAlgorithm.java:160)
            at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.runPipeline(GermlineCNVCaller.java:326)
            at org.broadinstitute.hellbender.tools.coveragemodel.germline.GermlineCNVCaller.doWork(GermlineCNVCaller.java:258)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
            at org.broadinstitute.hellbender.Main.main(Main.java:233)
    
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Happy New Year to you too @wdecoster. Can you also post the command you are running? E.g. are you specifying any Java memory options, e.g. Xmx, that limit the memory? Also, do you have some good amount of free storage for garbage collection?

  • wdecosterwdecoster University of AntwerpMember

    Right, I added the command below. Should I increase the Xmx or leave it out altogether? I have ~4Tb left, I would guess that should be enough?

    gatk-launch --javaOptions "-Ddtype=double -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx100g" GermlineCNVCaller \
     --jobType LEARN_AND_CALL \
     --input target_coverage.txt \
     --contigAnnotationsTable hg19_contig_annotations.tsv \
     --copyNumberTransitionPriorTable /home/wdecoster/wes/GermlineCNVCaller/hg19_germline_CN_priors.tsv \
     --outputPath results \
     --sexGenotypeTable samples.gender.txt \
     --targets processed_targets_mod.tsv \
     --disableSpark true \
     --rddCheckpointing false
    
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @wdecoster,

    As I mentioned before, I was able to run 40 WES BAMs from the 1000 Genomes Project using -Xmx100g. I believe I had ~500GB of storage and this run completed in 6 hours. If you weren't specifying the -Xmx parameter before, then this should help now.

    Fyi, the developers are working to make gCNV calling much more memory efficient.

  • wdecosterwdecoster University of AntwerpMember

    Looking forward to that, it's using a lot of memory here (varying, up to 400Gigabyte of RAM).
    I'm using GATK4.beta.5, is there already an improvement in the soon-to-be-available GATK4?

    It is now running for days and days and shows output as shown in the screenshot below.
    Is this normal? Of course, 625 bams will take longer than your dataset. Maybe I should select a smaller set for the panel of normals?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited January 2018

    @wdecoster
    Hi,

    It seems there are indeed speed and memory improvements in version 4.0.0.0 :smile: However, you should wait for our GATK forum post/documentation page about running the tool in the latest release. It requires a mini-tutorial, as the arguments and tool usage have changed quite a bit since beta. You can expect that to come out in the next week.

    -Sheila

    P.S. Have you tried running the WDL script we provide here?

  • wdecosterwdecoster University of AntwerpMember

    Hi Sheila,

    Thanks for the information. Looking forward to 4.0.0.0 and that forum post then! I'll leave the job running for now until I annoy the other users too much because I take up 442Gb memory now :)

    I haven't tested WDL, I guess I should have a look at it sooner or later.

    Cheers,
    Wouter

  • sleeslee Member, Broadie, Dev ✭✭
    edited January 2018

    Hi @wdecoster,

    Great to hear that you've been trying to run the Spark version of gCNV in the beta release. As @Sheila pointed out, we have significantly revamped this tool in the 4.0.0.0 release. In particular, the model has been expanded to include both rare and common CNVs and we also perform inference using a python package called pymc3.

    However, I think you may still be able to generate decent results with your current run of the beta release if you simply limit the number of samples in the PoN and use this PoN to call the rest of your samples, as @shlee mentioned earlier. I would wager that ~100 samples in the PoN would probably be sufficient, but you can certainly adjust if necessary, using your previous runs and memory usage as a gauge.

    If you'd like to try the new version, the WDL that @Sheila pointed to is a good place to start. In any case, thanks for taking gCNV for a spin! We'll have some blog posts and tutorial materials---as well as some Best Practices recommendations for priors and parameter values---coming soon, so stay tuned.

  • wdecosterwdecoster University of AntwerpMember

    Hi @slee,

    Sounds great, I'll wait a bit for the documentation then!

    Cheers,
    Wouter

Sign In or Register to comment.